Training: searching statistics on


5 February 2012, CILIP HQ (organised by CILIP Information Services Group)

Notes on the day

Geoff Davies, Implementation Manager at the ONS, gave a run-through of the navigation of the newly redesigned Recent improvements include new search functionality, additional synonyms and acronyms and better navigation.

  • Several new elements on the homepage will be useful for headline figures – the “carousel” in the centre which announces the latest big releases, and the Key figures panel on the right which is a quick way of accessing the most up-to-date stats for GDP, unemployment etc.
  • The UK Publication Hub (link at bottom of landing page) holds all government data, not just that held by ONS.
  • ONS YouTube videos give explanations of big releases, and the new interactives are a good way of interrogating data.
  • Links to the previous site are obsolete, so if you’ve saved a URL it won’t redirect to the new site, but all the statistical releases have been carried over, so they will be there if you dig deep enough.

Geoff then outlined the basic structure of the ONS site, which is a simple nested hierarchy:

  • Business area (section) folder -> each publication has a folder -> calendar entry for each edition -> edition folder -> all content “nuggets” released on that date eg. charts, data tables, summary, statistical bulletin etc.
  • Every edition published to the site has a separate release page, which goes live on the publication date (the release calendar includes future publications). Everything relating to that release is accessible from the page – datasets and reference tables are listed at the bottom of the page, and contact details for a named person responsible for that release are to the right.
  • The redesigned theme pages, which are launching shortly and will be rolled out gradually across each theme, are simplified and easier to understand, and much more visual than the current text-based version. A moving carousel, in the centre, gives the most recent data. They are a work in progress and will be improved as more pages are updated.

Geoff gave a quick run-through of the navigation tabs across the top of the site:

  • Browse by theme – alphabetical index of themes -> individual theme pages, with the most relevant or important content at the top.
  • Publications – chronological list, with filters on the right to narrow down content.
  • Data – chronological list, search for datasets and reference tables here (not available in publications list).
  • Release calendar – all releases, chronologically, including future releases (the landing page only includes big releases). If you click through to a release page there’s a link to all editions at top right, to access previous data.
  • Guidance and methodology – gives background on the ONS and data collection, classifications etc.
  • Media Centre – includes official statements and releases, and letters correcting misinterpretations of stats in the media.
  • About ONS – most useful is the ad hoc research undertaken by ONS, which isn’t searchable in the publications indexes. Go to Publication Scheme under What We Do, then Published Ad Hoc Data on the left.

Continuing problems with the site

The main issue users have raised since the redesign is difficulty in finding content. The ONS has decentralised publishing, which means each department is responsible for their own releases (around 460 staff contributing to the site). This has led to inconsistency, as some staff are reluctant to change old methods or not interested in web standards, and some are just too busy. The ONS are working on solutions:

  • training staff on how to tag content with six or seven most useful keywords (too few, or too many irrelevant ones, mean weaker search results), and improving the metadata.
  • publishing support team to help departments who are too busy or uninterested.
  • health checks are run on content regularly.
  • there is pressure from management to conform to the new standards.

Practical examples

We ran through some real search queries for tips on searching the site, with assistance from a member of the customer services team (whose name I missed, sorry!). The main advice was to search through the release calendar using filters as necessary (selecting ‘last 5 years’ clears future releases from the list), and to use the ‘all editions’ link on each release page to locate time series data.

Unfortunately, the practical examples just proved that the search functionality of the site still needs improvement (if a roomful of information professionals struggles to find data you have a problem!). Advising users to call the customer services team with any queries is helpful but no use in a high pressure environment where data is needed within hours, not days – what I really needed were ways of finding the stats myself.


  • The redesigned site is much cleaner and simpler than the old version, and easier to navigate, but it’s still difficult to actually find specific data. It’s a shame the ONS didn’t take advantage of having a room full of information professionals to interrogate the system further and to make notes of improvements needed.
  • Some of the problems the ONS are facing are familiar – they’ve decentralised uploading of content, but some staff are reluctant to adopt new techniques and others are over-keen and tag excessively. This is true of other new technologies being adopted across many library sectors (certainly it applies to social media in the news industry). It’s an issue of good training and perseverance with the new standards, and having support from management is vital.
  • Some issues with the redesign are similar to those we’ve experienced in relaunching our intranet recently – lack of redirects from old pages, decentralising, need for training.

Applying what I learned

  • The key figures and carousel on the front page of will be incredibly useful for finding the most recent headline data quickly (a common query).
  • The new theme pages will be very useful once they are launched, as a quick way to access key figures on a topic (another common query).
  • I’ll bookmark the ad hoc data page as an extra location to check for data.
  • The training also offered some good ideas on how to ensure consistently good content and metadata, which we could apply to any new roles that our department undertakes.

Research: The Reading Agency’s Digital Research Report

The Reading Agency’s Digital Research Report (see article in this month’s Cilip Update) contains some interesting statistics on the current level of digital engagement in UK public libraries.

The news is positive in terms of engagement – a majority of libraries (66.7%) use online resources for marketing, and 59.6% provide wifi access. Many also use digital resources in reading activities (65.5% use digital photographs, 32.7% use Twitter), and 40.4% “use social media to engage with young people”.

But work still needs to be done to promote the proper application of digital media. Training is patchy, and over 98% of respondents don’t have a digital strategy.

Using digital media is all well and good, but if you don’t apply it appropriately you won’t get the most out of it. Rather than eagerly jumping on the digital bandwagon, it pays to consider the best (and most cost-effective) ways your library can utilise digital media first.

SEO: not always a good thing

There’s an interesting post from Kelly McBride over on Poynter, discussing the “ground zero mosque” story.

I use the quotation marks because the proposed building isn’t on ground zero and isn’t actually a mosque but an Islamic cultural centre, including, as McBride says, “a pool, community rooms and offices”.

Unfortunately once the “mosque at ground zero” story started circulating, it was quickly picked up and broadcast throughout the media in the US and worldwide. A quick check of UK papers shows 111 articles containing the falsehood (including, perhaps unsurprisingly, articles in today’s Daily Mail and Daily Express that make no attempt to correct the mistake).

Even though the media has (largely) recognised the error, the phrase won’t go away because the dissemination of news online means the temptation is there to tag every related story with “ground zero” “mosque” to pick up readers using those search terms.

As McBride points out:

…now that the story has peaked, now that we know the real facts, can anyone possibly correct the record? Not if Google has anything to say about it.

That’s because accurate or not, people are searching for the term “ground zero mosque.” So if you want to reach people who are looking for information, you have to use that term.

It’s easy enough to do in a story meant to debunk the phrase. All you have to write is, “It’s not a ground zero mosque.” But, what about ongoing coverage? Must you keep using the inaccurate term?

Sadly, the answer is yes, according to people familiar with SEO practices.

McBride also makes the point that, in a world where bloggers and not just media organisations play a role in initiating news stories, fact-checking is increasingly important for journalists. More reason than ever to boost news libraries, not close them!

Guardian jobs: Trainee data researcher

The Guardian is currently advertising for two trainee data researcher, to work with the graphics and data teams as well as we lovely people in the research department.

According to the ad, “They should be enthusiastic about data, willing to learn, and have good visual and statistical skills. The two roles are offered as 1-year fixed term contracts, 5-day week working pattern. Closing date for receipt of applications is Sunday 8 August 2010.”

People with library and information management backgrounds are encouraged to apply, so if you want to work with us lovely lot in the Guardian library get writing!

Full job spec here

Library Day in the Life: Monday

Assistant librarian, Guardian News & Media (media library)

– Library Day in the Life project

So, a quick catch up on my day so far.

Today is an exciting one for the Guardian – its investigation into the Afghan war, in conjunction with the New York Times and Der Spiegel (using data provided by Wikileaks), went to print this morning and is causing ripples (the White House isn’t happy…). Wikileaks got hold of five years’ worth of army incident logs, which have been analysed and processed by journalists here.

Our role on the project was to create a glossary of terms, posted on the Guardian Datablog, to explain some of the numerous acronyms that pepper the reports. We’ve been working on it for a couple of weeks – finding new acronyms, then trying to identify their meaning using web resources. It was nice to come into work first thing and find my byline in the paper!

Otherwise it’s been a normal day in the office.

  • Uploaded Saturday’s From the archive article to the website, which was printed in Saturday’s paper (we no longer work weekends so we catch up on weekend jobs every Monday). Basically, it’s an ‘on this day’ column that runs in the paper Monday-Saturday; we find and edit the articles, and once they’re published we add them to the series page on the Guardian website using InCopy software, tagging them with relevant keywords (one of my colleagues commented the other day that tagging is just a new method of using a very traditional library skill, like marking up newspaper articles for placement in cuttings files in the old days). Then we tweet them using the department’s account (@guardianlibrary). I’ve written about the From the archive series before here.
  • Checked over From the archive articles for the next few weeks (they need to be edited down to 420-460 words)
  • Attended a department meeting about new roles that are going to be created at the end of the summer.
  • Helped rewrite my job description (I’m going on maternity leave in 7 weeks so we’re advertising for cover). It was so out of date it mentioned Access, a programme I haven’t ever used here (and I’ve been here ten years).
  • Took a query from the Graphics department – they’re planning an interactive graphic for the web on the London 2012 Olympic site, so they asked me to find as much info as I could on the plans for each building. I’ve had a quick look on the web but I think I’ll need to ring the Olympic Authority later.

Now for lunch.

It’s been a quiet morning but the phone has started ringing now. The number of queries we get from journalists varies wildly day to day – some days we only get a couple, like today (so far!), other days the phone doesn’t stop ringing.

  • A journalist is writing a feature piece on David Cameron’s first 100 days in office. Firstly, I had to clarify which date will be his hundredth (a bit tricky – do you count the day he took over as day zero or day one?) Now I’m writing a timeline of key Cameron moments so far, which will aid the writer but might go in the paper too (and will definitely go onto our intranet). One of those “why haven’t we been doing that anyway?” moments
  • Profile, interview and news search on Greg Mortenson for a journalist – interviewers often want background info on interview subjects
  • Showed a colleague how to update our BP oil spill timeline online

Site visits: GMTV

Last week Pete Fox, the archive manager at GMTV, was kind enough to take an hour out to give us a tour of the GMTV archives. Although the subject matter often differs (more weighted towards celebs, less hard news) and the content is in a different format (film and images, not printed text) the methods used to research and the process of archiving material is very similiar. We don’t get the fun of sharing a lift with Jenni Falconer though (or signed photos from the columnists!).

Reflections on visit:

  • Little impetus or funding for digitising collection of tapes – digitised as they get asked for, as part of archiving the daily programmes; old tapes that expire are just thrown away
  • Rolling out services to users doesn’t mean they will do the work themselves

Part of the job: On This Day

One of the jobs our department is responsible for is the From the Archive column which runs on the leader page of the Guardian each day. Our basic brief is to find an interesting article that ran on the same date, and edit it down to around 450 words. It doesn’t have to be pure news, or editorial – it can be about anything as long as the writing, the author or the subject are interesting. Recent pieces have covered the films Tootsie and Gandhi (1983), airport virginity tests (1979), sherry parties (1936) and the penny post (1840).

Rather than poring over old copies of the newspaper, we use the digital archive to search for stories, either by browsing specific issues for events we’ve already pinpointed or searching for topics using the advanced search option. It can be frustrating when you can’t find a decent article on a major event, or when you realise your killer piece has already been printed in the column, but you do come across some really fascinating items.

Like the photo, from a March 1950 edition of the Guardian, which is captioned “Various styles of negotiating a barbed wire fence in the English National Youths’ Cross-country Championship at Aylesbury on Saturday”. I don’t have fond memories of cross country running at school, but I don’t think it was ever that extreme.

Cross country running and a barbed wire fence