Training: searching statistics on ons.gov.uk

Image

5 February 2012, CILIP HQ (organised by CILIP Information Services Group)

Notes on the day

Geoff Davies, Implementation Manager at the ONS, gave a run-through of the navigation of the newly redesigned ons.gov.uk. Recent improvements include new search functionality, additional synonyms and acronyms and better navigation.

  • Several new elements on the homepage will be useful for headline figures – the “carousel” in the centre which announces the latest big releases, and the Key figures panel on the right which is a quick way of accessing the most up-to-date stats for GDP, unemployment etc.
  • The UK Publication Hub (link at bottom of landing page) holds all government data, not just that held by ONS.
  • ONS YouTube videos give explanations of big releases, and the new interactives are a good way of interrogating data.
  • Links to the previous site are obsolete, so if you’ve saved a URL it won’t redirect to the new site, but all the statistical releases have been carried over, so they will be there if you dig deep enough.

Geoff then outlined the basic structure of the ONS site, which is a simple nested hierarchy:

  • Business area (section) folder -> each publication has a folder -> calendar entry for each edition -> edition folder -> all content “nuggets” released on that date eg. charts, data tables, summary, statistical bulletin etc.
  • Every edition published to the site has a separate release page, which goes live on the publication date (the release calendar includes future publications). Everything relating to that release is accessible from the page – datasets and reference tables are listed at the bottom of the page, and contact details for a named person responsible for that release are to the right.
  • The redesigned theme pages, which are launching shortly and will be rolled out gradually across each theme, are simplified and easier to understand, and much more visual than the current text-based version. A moving carousel, in the centre, gives the most recent data. They are a work in progress and will be improved as more pages are updated.

Geoff gave a quick run-through of the navigation tabs across the top of the site:

  • Browse by theme – alphabetical index of themes -> individual theme pages, with the most relevant or important content at the top.
  • Publications – chronological list, with filters on the right to narrow down content.
  • Data – chronological list, search for datasets and reference tables here (not available in publications list).
  • Release calendar – all releases, chronologically, including future releases (the landing page only includes big releases). If you click through to a release page there’s a link to all editions at top right, to access previous data.
  • Guidance and methodology – gives background on the ONS and data collection, classifications etc.
  • Media Centre – includes official statements and releases, and letters correcting misinterpretations of stats in the media.
  • About ONS – most useful is the ad hoc research undertaken by ONS, which isn’t searchable in the publications indexes. Go to Publication Scheme under What We Do, then Published Ad Hoc Data on the left.

Continuing problems with the site

The main issue users have raised since the redesign is difficulty in finding content. The ONS has decentralised publishing, which means each department is responsible for their own releases (around 460 staff contributing to the site). This has led to inconsistency, as some staff are reluctant to change old methods or not interested in web standards, and some are just too busy. The ONS are working on solutions:

  • training staff on how to tag content with six or seven most useful keywords (too few, or too many irrelevant ones, mean weaker search results), and improving the metadata.
  • publishing support team to help departments who are too busy or uninterested.
  • health checks are run on content regularly.
  • there is pressure from management to conform to the new standards.

Practical examples

We ran through some real search queries for tips on searching the site, with assistance from a member of the customer services team (whose name I missed, sorry!). The main advice was to search through the release calendar using filters as necessary (selecting ‘last 5 years’ clears future releases from the list), and to use the ‘all editions’ link on each release page to locate time series data.

Unfortunately, the practical examples just proved that the search functionality of the site still needs improvement (if a roomful of information professionals struggles to find data you have a problem!). Advising users to call the customer services team with any queries is helpful but no use in a high pressure environment where data is needed within hours, not days – what I really needed were ways of finding the stats myself.

Reflections

  • The redesigned ons.gov.uk site is much cleaner and simpler than the old version, and easier to navigate, but it’s still difficult to actually find specific data. It’s a shame the ONS didn’t take advantage of having a room full of information professionals to interrogate the system further and to make notes of improvements needed.
  • Some of the problems the ONS are facing are familiar – they’ve decentralised uploading of content, but some staff are reluctant to adopt new techniques and others are over-keen and tag excessively. This is true of other new technologies being adopted across many library sectors (certainly it applies to social media in the news industry). It’s an issue of good training and perseverance with the new standards, and having support from management is vital.
  • Some issues with the redesign are similar to those we’ve experienced in relaunching our intranet recently – lack of redirects from old pages, decentralising, need for training.

Applying what I learned

  • The key figures and carousel on the front page of ons.gov.uk will be incredibly useful for finding the most recent headline data quickly (a common query).
  • The new theme pages will be very useful once they are launched, as a quick way to access key figures on a topic (another common query).
  • I’ll bookmark the ad hoc data page as an extra location to check for data.
  • The training also offered some good ideas on how to ensure consistently good content and metadata, which we could apply to any new roles that our department undertakes.

Research: The Reading Agency’s Digital Research Report

The Reading Agency’s Digital Research Report (see article in this month’s Cilip Update) contains some interesting statistics on the current level of digital engagement in UK public libraries.

The news is positive in terms of engagement – a majority of libraries (66.7%) use online resources for marketing, and 59.6% provide wifi access. Many also use digital resources in reading activities (65.5% use digital photographs, 32.7% use Twitter), and 40.4% “use social media to engage with young people”.

But work still needs to be done to promote the proper application of digital media. Training is patchy, and over 98% of respondents don’t have a digital strategy.

Using digital media is all well and good, but if you don’t apply it appropriately you won’t get the most out of it. Rather than eagerly jumping on the digital bandwagon, it pays to consider the best (and most cost-effective) ways your library can utilise digital media first.

On the web: DocumentCloud

The Christian Science Monitor librarian (@CSMLibrary) flagged up a blogpost about DocumentCloud from the NewsliBlog today.

DocumentCloud is a free online tool for converting PDF documents into web text that you can annotate, making the contents much more accessible and useable. News organisations in particular have used it to provide readers with official documents that add value to a news story.

As Derek Willis (@derekwillis) says in his blogpost:

It’s a great way to maintain a set of files that anyone from the newsroom can access and annotate, making it a good candidate for long-term project work. And when you’re reading to show that work to the world, you can make any or all of the files public.

I’ve not had time to play around with it, but I’ve struggled in the past with ways of stripping text from a PDF quickly and without too many errors, so anything that helps speed up the process can only be a good thing!

I can think of several time-consuming queries it would have helped with off the top of my head – editing From the Archive articles that predate our text archive (anything pre-1984), taking content for the Datablog from government documents that are only released in PDF or the Russian spies story, for starters.

On the web: data visualisation: howbigreally.com

Something we get asked for fairly regularly in the news library is a size comparison – some journalists like to be able to equate a distance or area in a story to a recognisable place in the UK (recent examples include a piece on nature reserves “covering an area the size of the west Midlands” and a reference to British manoeuvres in Sangin, Afghanistan “to capture an area the size of the Isle of Wight”).

There is a questionmark over the validity of such comparisons – how much value does it really add to a story, and how many people have a strong enough grasp of geography to be able to visualise even UK areas? But they show no sign of dropping out of use.

A new online tool developed by the BBC could help media librarians, journalists and readers to draw more relevant, and useful, comparisons in future. BBC Dimensions, found at howbigreally.com, takes a template of a newsworthy event (for example the area afffected by the floods in Pakistan, the Twin Towers or the BP oil spill) and lays it over a Google map at a location of your choice.

Dimensions is a prototype and it has its limits – you can’t create a new template so the event you’re covering has to be listed already; you can’t play with the shape of the template so you need a reasonable amount of spacial awareness to be able to compare it to specific areas like counties or countries; and, perhaps most worryingly, the disclaimer at the bottom states, “We make no guarantee as to its accuracy, reliability or performance” – but it is a pretty good starting point for queries of comparison, and it’s a nifty way of using data visualisation to add value to a news story.

More on SEO: STI or STD?

I inadvertently created my own example of the “ground zero mosque” problem just after I wrote about it last week.

Writing on the Datablog, I posted the latest statistics on sexually transmitted infections (STIs) from the Health Protection Agency. STI is the correct, recognised term for things like chlamydia, herpes, gonorrhoea and HIV. The problem is that for years such infections were termed STDs – sexually transmitted diseases –  and although the medical world has stopped using that term, the real world hasn’t.

I ummed and aahed about whether I should be accurate, and use STI throughout the article, or go with the more recognisable STD. In the end I used STI in the body but stuck with STD in the headline and, therefore, the URL (“STDs in England: Breakdown by region, gender and ethnicity”). That way, I reasoned, search engines would pick up the term STD but the article stayed true to the recognised term.

Surpringly, no one in the comment thread picked up on the use of STDs versus STIs. I’m sure it’s frustrating for sexual health professionals when the media continues to peddle outdated terms, but until the SEO process adapts unfortunately we need to keep using them, if we’re to capture as many readers as possible.

You’ll notice by the way that I refrained from using ‘sex’ rather than ‘gender’ in the headline, which would probably have brought in a lot more…

Guardian jobs: Trainee data researcher

The Guardian is currently advertising for two trainee data researcher, to work with the graphics and data teams as well as we lovely people in the research department.

According to the ad, “They should be enthusiastic about data, willing to learn, and have good visual and statistical skills. The two roles are offered as 1-year fixed term contracts, 5-day week working pattern. Closing date for receipt of applications is Sunday 8 August 2010.”

People with library and information management backgrounds are encouraged to apply, so if you want to work with us lovely lot in the Guardian library get writing!

Full job spec here