Training: searching statistics on


5 February 2012, CILIP HQ (organised by CILIP Information Services Group)

Notes on the day

Geoff Davies, Implementation Manager at the ONS, gave a run-through of the navigation of the newly redesigned Recent improvements include new search functionality, additional synonyms and acronyms and better navigation.

  • Several new elements on the homepage will be useful for headline figures – the “carousel” in the centre which announces the latest big releases, and the Key figures panel on the right which is a quick way of accessing the most up-to-date stats for GDP, unemployment etc.
  • The UK Publication Hub (link at bottom of landing page) holds all government data, not just that held by ONS.
  • ONS YouTube videos give explanations of big releases, and the new interactives are a good way of interrogating data.
  • Links to the previous site are obsolete, so if you’ve saved a URL it won’t redirect to the new site, but all the statistical releases have been carried over, so they will be there if you dig deep enough.

Geoff then outlined the basic structure of the ONS site, which is a simple nested hierarchy:

  • Business area (section) folder -> each publication has a folder -> calendar entry for each edition -> edition folder -> all content “nuggets” released on that date eg. charts, data tables, summary, statistical bulletin etc.
  • Every edition published to the site has a separate release page, which goes live on the publication date (the release calendar includes future publications). Everything relating to that release is accessible from the page – datasets and reference tables are listed at the bottom of the page, and contact details for a named person responsible for that release are to the right.
  • The redesigned theme pages, which are launching shortly and will be rolled out gradually across each theme, are simplified and easier to understand, and much more visual than the current text-based version. A moving carousel, in the centre, gives the most recent data. They are a work in progress and will be improved as more pages are updated.

Geoff gave a quick run-through of the navigation tabs across the top of the site:

  • Browse by theme – alphabetical index of themes -> individual theme pages, with the most relevant or important content at the top.
  • Publications – chronological list, with filters on the right to narrow down content.
  • Data – chronological list, search for datasets and reference tables here (not available in publications list).
  • Release calendar – all releases, chronologically, including future releases (the landing page only includes big releases). If you click through to a release page there’s a link to all editions at top right, to access previous data.
  • Guidance and methodology – gives background on the ONS and data collection, classifications etc.
  • Media Centre – includes official statements and releases, and letters correcting misinterpretations of stats in the media.
  • About ONS – most useful is the ad hoc research undertaken by ONS, which isn’t searchable in the publications indexes. Go to Publication Scheme under What We Do, then Published Ad Hoc Data on the left.

Continuing problems with the site

The main issue users have raised since the redesign is difficulty in finding content. The ONS has decentralised publishing, which means each department is responsible for their own releases (around 460 staff contributing to the site). This has led to inconsistency, as some staff are reluctant to change old methods or not interested in web standards, and some are just too busy. The ONS are working on solutions:

  • training staff on how to tag content with six or seven most useful keywords (too few, or too many irrelevant ones, mean weaker search results), and improving the metadata.
  • publishing support team to help departments who are too busy or uninterested.
  • health checks are run on content regularly.
  • there is pressure from management to conform to the new standards.

Practical examples

We ran through some real search queries for tips on searching the site, with assistance from a member of the customer services team (whose name I missed, sorry!). The main advice was to search through the release calendar using filters as necessary (selecting ‘last 5 years’ clears future releases from the list), and to use the ‘all editions’ link on each release page to locate time series data.

Unfortunately, the practical examples just proved that the search functionality of the site still needs improvement (if a roomful of information professionals struggles to find data you have a problem!). Advising users to call the customer services team with any queries is helpful but no use in a high pressure environment where data is needed within hours, not days – what I really needed were ways of finding the stats myself.


  • The redesigned site is much cleaner and simpler than the old version, and easier to navigate, but it’s still difficult to actually find specific data. It’s a shame the ONS didn’t take advantage of having a room full of information professionals to interrogate the system further and to make notes of improvements needed.
  • Some of the problems the ONS are facing are familiar – they’ve decentralised uploading of content, but some staff are reluctant to adopt new techniques and others are over-keen and tag excessively. This is true of other new technologies being adopted across many library sectors (certainly it applies to social media in the news industry). It’s an issue of good training and perseverance with the new standards, and having support from management is vital.
  • Some issues with the redesign are similar to those we’ve experienced in relaunching our intranet recently – lack of redirects from old pages, decentralising, need for training.

Applying what I learned

  • The key figures and carousel on the front page of will be incredibly useful for finding the most recent headline data quickly (a common query).
  • The new theme pages will be very useful once they are launched, as a quick way to access key figures on a topic (another common query).
  • I’ll bookmark the ad hoc data page as an extra location to check for data.
  • The training also offered some good ideas on how to ensure consistently good content and metadata, which we could apply to any new roles that our department undertakes.

Week in the life: Monday 24 May 2010

Arrived 9.30am (or maybe a little bit after!). My colleague Holly is away this week so I’m doing a few tasks first thing that she would normally do if she was working the early shift (9am).

Normally I’d work a 9.30-5.30 shift, but today I’m leaving a few hours early for a baby scan (exciting!). Not to worry though, it was a busy morning so plenty to write about.

  • Afghanistan casualties – We check the MOD site every morning for details of any new deaths, then add them to the casualty list we store in Google docs. This list is made available internally through the research dept intranet, and externally through the Datablog. There was one casualty over the weekend, so I added his name and details to the list of British casualties, and changed the number of UK deaths on the spreadsheet.
  • Afghanistan wounded – We update this as and when figures are released. I checked the MoD and DoD sites but no further figures have been released.
  • Afghanistan casualties worldwide – We keep a list on the intranet of the casualties suffered by other countries as well as the UK. Checked iCasualties, and updated the list on the intranet.
  • Cuts query for John Harris on Alastair Campbell – used Lexis Nexis to search for profiles/interviews, and a review of his previous diaries book, and emailed nine or ten articles.
  • Corrections – The Guardian prints corrections every day in the Corrections and Clarifications column, and we add them to the relevant articles on our internal text archive. Found today’s corrections in the archive library, then used the Corrections tool to add each correction to the article it is correcting.
  • Wrote up my notes from In Copy training last week – we’re going to start loading the From the archive column onto the website ourselves soon, as it’s been discontinued online (no one else has time or inclination to do it), so we’ve had training on how to do it from In Copy.
  • Uploaded Saturday’s From the archive article as a practice, and ironed out a few glitches with someone from ESD. We’re trying to settle on a uniform style that looks okay but also incorporates good SEO, and so it’s clear it’s an old article not breaking news.
  • From the archive for next week – We try to keep at least two weeks ahead of the From the archive column, which we send off every Thursday for subbing, but we’ve been busy lately so there are a few gaps to fill in. I checked our On this day calendar for ideas, saw that the driving test has been compulsory for 75 years and searched the digital archive for driving test stories around the first week of June 1935. Came up with a House of Commons proceedings report about road safety, not one of those gems you find sometimes but quite interesting anyway. Typed the text into the Google doc we use to compile future articles, edited it to 450 words and entered the details onto the Google calendar so colleagues know I’ve covered that date.
  • Sidebar for the newspaper – The home desk want a panel to go with the Dr Andrew Wakefield story tomorrow, detailing the history of the case, so I checked the wires and Lexis Nexis for previous stories, timelines and dates then wrote it through, with about ten bullet points on the original study, GMC investigations and other key dates.
  • Updating Aristotle – The Guardian’s database of MPs needs updating with the new shadow cabinet posts for Labour MPs, so I copied over the job titles from the Labour website and used the production software to amend each minister’s profile online. Harriet Harman is now officially the acting leader etc.

Resources: Eurostat

A representative of Eurostat‘s media support team paid a visit to the office to talk us through their statistics database. It can be hard sometimes to locate specific stats on an unfamiliar website, particularly if you are up against a tight deadline, so it was good to get an overview of the type of data stored and the different ways you can search Eurostat.

  • Eurostat is the central store for EU and EFTA countries’ stats, the central institute of the European Statistics System
  • Data from NSIs go to Eurostat, are harmonised, they compile Euro aggregates, then disseminate (mainly web only)
  • Euro Indicators are regularly released economic indicators – they always cover the EU and sometimes extend further if comparable stats are available – unemployment, GDP, trade, inflation etc – deficit and debt released every April and Oct
  • Also release ad hocs as and when – GDP per capita, population, tax trends, the yearbook, for Women’s Day etc
  • Don’t cover Eurobarometer surveys (DG Comm), EU budget figures (DG Budget, Inforegio for regional figs), tax rates (DG Taxud – who has highest/lowest VAT etc)
  • Their remit is to report stats independently and neutrally, without a political agenda
  • Release calendar is issued every Oct
  • Time series usually go back to the 1990s across all countries, as that’s when the data was harmonised, but goes much further for some countries
  • DG Ecfin has time series data for some economic indicators but not complete as not comparable
  • Microdata isn’t on the site – could be used to identify individuals eg surveys; can get access for research purposes only

To use the site:

  • Country profiles (from home page) – compare a country to EU ave etc;  only most recent data; can then link to table of data to download
  • Data in Focus (from home page) – online PDF releases of topics of data; links to data files
  • Stats in Focus (from home page) – text, analysis as well as numbers; links to data files
  • Statistics tab – organised by themes and sub-themes; click on a theme, links down the side give related datasets; click through the hierarchy
  • Search – the homepage and the whole site – can search publications, datasets or metadata – or just search in the database

To use the Statistics Database:

  • go into pre-defined tables, click on topic, work through the hierarchy to get a table of data, access to maps, graphs, can set parameters and download
  • go into database, use the data tool to extract data; can set own parameters, display format etc

news:rewired #3: data mashing panel

Tony Hirst (@psychemedia), Open University blogger and sometime Guardian Datastore collaborator

The Guardian Datastore released a spreadsheet of MPs’ expenses data, in a convenient format BUT it’s hard to make sense of spreadsheets, pictures and graphs tell you more and add value

  • VISUAL REPRESENTATIONS of data are a valuable tool
  • can interact, have conversations with the data, explore stories within it – like mapping travel claims expenses to show where MPs travelled from, colour coded for level of claim

Have to make the data fuid – wire it into other web-based resources to add value

  • take data from Google spreadsheet (add &output=csv  &range=B2:AH684 or whatever)
  • Many Eyes Wikified to visualise data, do calculations
  • might need to clean data (get rid of £, & etc) – Yahoo Pipes – eg can run a regex (regular expression) to replace & with AND  – then output a cleaned CSV file
  • can filter data to pick particular sections eg 1 MP, using theyworkforyou MPs’ info

Google spreadsheet — Yahoo Pipes — Many Eyes Wikified — embed in blog etc.

Location data isn’t always easy to add so use a join

  • Google Fusion Tables / Dabble DB combines two spreadsheets with a common field (exact match) to match data eg MPs’ names and geo coordinates
  • can use Yahoo Pipes to plot coordinates of postcodes on a map (y:location)
  • can plot it on Google Earth

Can query Google spreadsheets if you import data

  • =importhtml(“URL”, “table”,1) – will import a table from the web
  • can then write queries on it (see Hirst’s recent blog for further details)

Useful sites for advice and how tos:

Francis Irving (@frabcus), – all about “poking the beast”

New sort of journalism making decisions about datasets, building stories from websites – open FoI, campaign leaflets lead to stories

  • includes voting record analysis
  • for publicly submitted FoI requests eg allotment waiting lists – can search file type:xls for spreadsheets
  • – indexed party campaign leaflets – can browse by constituency, set up email alert, do tineye reverse image lookup eg BNP
  • democracy club – signing up volunteers in every constituency to make election data available (leaflets etc)

Estonia is the goal – they publish all govt data, minutes have to be made available 20 minutes after the end of a meeting


  • Is there an international data source? UN is tidying up its data at the moment; wikileaks  is good;; International Telecomms Union; can do deep Google searching for .xls using site:…
  • Can you strip data from PDFs? OCR software (Adobe?) could strip out basic .jpg data but would have to check for errors