Site visit: British newspaper library

Bound volumes of newspapers  at British Library Photograph: Martin Argles

Photograph: Martin Argles

It’s traditional to send our trainee on a visit to the British newspaper library in Colindale, and I’d not been since I was one many moons ago, so I tagged along with Nina and our archivist when they visited last week.

The newspaper library is in the process of moving to Boston Spa, the British Library’s base in Yorkshire, and the transition is evident. Gone are the impressive cameras I remember, used for photographing newspaper pages for microfilm. Fragments of binding and aging newsprint litter the floor in the stacks (every bound volume has to be weighed and measured prior to the move north, and if they sweep up the dust, airborne particles could cause damage).

It was really interesting to hear about the move – as well as the northern base there’ll be a dedicated reading room at the British Library on Euston Road (although bound volumes will have to be ordered from Boston Spa). Fascinating, too, to see inside “the pen” where some of the more precious volumes are held, including (to my other half’s delight) volumes of Marvel comics from the 1970s and British counterparts from their heigh-day in the ’40s.

The newspaper library fulfills my romantic ideal of an archive – the musty smell of weathered paper, cracked spines of long-forgotten tomes like John Bull and The Cherokee Phoenix (an attempt at a reservation paper from the 1830s). I could happily get lost in the stacks.

Times have changed, space is short and there’s a pressing need to preserve old volumes (an ongoing programme of digitisation will ensure safer access to millions of pages of print, though 40m pages is only a small proportion of the overall collection). The move is clearly important to the continuing success of the newspaper archive. But I hope that some of the magic of Colindale will remain at the new facility at Boston Spa.

Martin Belam on the editorial pitfalls when digital and print collide

Martin Belam has flagged up one of the dangers of online reporting over on curreybetdotnet.

Yesterday’s Times website headline for the Sean Hoare story, Hacking whistleblower found dead, was unfortunately prepended with the ‘Live’ tag, leading, as Martin says, to the formula “Live: Someone is dead”.

the perfect example of something that wouldn’t be allowed to happen in print, but which hits a magic Venn diagram intersection of technology, editorial and information architecture allowing it to happen digitally.

Martin suggests adding more options for prepends – ‘Breaking’ or ‘Latest’ for example, which would remove the unintentional pun in the headline for such a tragic story.

It’s clear that more consideration needs to be given to traditional page layout when information architects, who are often far removed from the reporting process, are working in the media sphere.

Throwaway blogging: of morons and Mormons

I wrote my first post for the shiny new research department blog, From the archive, today. Well, two really, but the other one won’t launch until Thursday night.

Both posts are based on stories from the Guardian and Observer digital archive (the blog does what it says on the tin!), but are very different beasts.

Thursday’s post pulls together five or six articles, and a British Pathe video, to give the reader a rounded, comprehensive view of a moment in history (in this instance, the death of Brian Jones and the Rolling Stones’ Hyde Park concert a few days later). It took the best part of the morning to compile, with numerous rewrites, and still isn’t finished.

Today’s post is a one-paragraph correction I chanced across while I was editing an article for the On This Day column. It took less than half an hour to capture the image, write the blurb and post online. It had been retweeted twenty times by this evening and will no doubt be forgotten by the morning. But it is no less valid a blogpost because of it.

That’s what I like about blogging – there’s a time to be measured and a time to post a rapid, throwaway remark that nonetheless captures the imagination, however briefly.

Working week: June 27-June 28

I’m going to keep a work diary every week so I can track what I’ve done, and try to reflect before the start of the next week, to pick out portfolio tasks. I do like lists!

Monday June 27

  • Went through emails (first day back after 9 months, rather a lot of deleting!)
  • From the Archive pieces for July 7 and 8 – found and edited articles on striking dockers (1923) and Brian Jones (1969)
  • Guardian Films query – El Salvador journos/photogs in 1980s/1990s – checked Factiva and our internal text archive for articles with datelines, suggested digital archive for pics

Tuesday June 28

  • Factiva training – haven’t used it for a few years, good reminder of connectors, searching, sources, alerts, introduction to new workspace feature
  • Query about fish and chip shops – statistics, history – Factiva articles search, internal text archive to fill in gaps, Google search for source for stats
  • Query about Sarah Helm – background info, interviews, news stories – Factiva and internal text archive search
  • Had a look over the intranet to see what needs improving, reorganising
  • Updated Afghanistan casualty lists for intranet and Datablog
  • Wrote a blog post to accompany the Brian Jones archive piece next week

Guardian 190: From the archive

I can’t claim any credit for this because I’m on leave, but I was involved in pushing for the From the Archive blog initially so I’m a little bit proud!

The Guardian is celebrating its 190th birthday this month, and has pulled together a bundle of resources, including a rather nifty interactive showing 190 key moments in the Guardian’s development.

As part of that, the research department are blogging an article from each year – in order - on their blog From the Archive. I’m a bit late in highlighting it – they’ve already reached 1896 – but there’s plenty more to come, and you can access the back catalogue on the blog or through the main Guardian 190 microsite.

On the web: DocumentCloud

The Christian Science Monitor librarian (@CSMLibrary) flagged up a blogpost about DocumentCloud from the NewsliBlog today.

DocumentCloud is a free online tool for converting PDF documents into web text that you can annotate, making the contents much more accessible and useable. News organisations in particular have used it to provide readers with official documents that add value to a news story.

As Derek Willis (@derekwillis) says in his blogpost:

It’s a great way to maintain a set of files that anyone from the newsroom can access and annotate, making it a good candidate for long-term project work. And when you’re reading to show that work to the world, you can make any or all of the files public.

I’ve not had time to play around with it, but I’ve struggled in the past with ways of stripping text from a PDF quickly and without too many errors, so anything that helps speed up the process can only be a good thing!

I can think of several time-consuming queries it would have helped with off the top of my head – editing From the Archive articles that predate our text archive (anything pre-1984), taking content for the Datablog from government documents that are only released in PDF or the Russian spies story, for starters.

On the web: data visualisation: howbigreally.com

Something we get asked for fairly regularly in the news library is a size comparison – some journalists like to be able to equate a distance or area in a story to a recognisable place in the UK (recent examples include a piece on nature reserves “covering an area the size of the west Midlands” and a reference to British manoeuvres in Sangin, Afghanistan “to capture an area the size of the Isle of Wight”).

There is a questionmark over the validity of such comparisons – how much value does it really add to a story, and how many people have a strong enough grasp of geography to be able to visualise even UK areas? But they show no sign of dropping out of use.

A new online tool developed by the BBC could help media librarians, journalists and readers to draw more relevant, and useful, comparisons in future. BBC Dimensions, found at howbigreally.com, takes a template of a newsworthy event (for example the area afffected by the floods in Pakistan, the Twin Towers or the BP oil spill) and lays it over a Google map at a location of your choice.

Dimensions is a prototype and it has its limits – you can’t create a new template so the event you’re covering has to be listed already; you can’t play with the shape of the template so you need a reasonable amount of spacial awareness to be able to compare it to specific areas like counties or countries; and, perhaps most worryingly, the disclaimer at the bottom states, “We make no guarantee as to its accuracy, reliability or performance” – but it is a pretty good starting point for queries of comparison, and it’s a nifty way of using data visualisation to add value to a news story.

More on SEO: STI or STD?

I inadvertently created my own example of the “ground zero mosque” problem just after I wrote about it last week.

Writing on the Datablog, I posted the latest statistics on sexually transmitted infections (STIs) from the Health Protection Agency. STI is the correct, recognised term for things like chlamydia, herpes, gonorrhoea and HIV. The problem is that for years such infections were termed STDs – sexually transmitted diseases -  and although the medical world has stopped using that term, the real world hasn’t.

I ummed and aahed about whether I should be accurate, and use STI throughout the article, or go with the more recognisable STD. In the end I used STI in the body but stuck with STD in the headline and, therefore, the URL (“STDs in England: Breakdown by region, gender and ethnicity”). That way, I reasoned, search engines would pick up the term STD but the article stayed true to the recognised term.

Surpringly, no one in the comment thread picked up on the use of STDs versus STIs. I’m sure it’s frustrating for sexual health professionals when the media continues to peddle outdated terms, but until the SEO process adapts unfortunately we need to keep using them, if we’re to capture as many readers as possible.

You’ll notice by the way that I refrained from using ‘sex’ rather than ‘gender’ in the headline, which would probably have brought in a lot more…

Library Day in the Life: Wednesday

Assistant librarian, Guardian News & Media (media library)

- Library Day in the Life project

Another conference guest this morning – Diane Abbott, also running for the Labour leadership. Only Ed M and Andy B to go and we’ll have a full set.

  • Attended morning conference
  • Updated the David Cameron timeline on the intranet with his Gaza comments
  • Wrote a story for the company intranet and noticeboard giving links to the new resources on our intranet, hopefully it’ll drive up interest
  • Looking for articles for the From the Archive series, for a couple of weeks’ time. Found on eon Woodstock (1969), Churchill’s speech on the Battle of Britain (1940) and a nice miscellany on the use of gravy browning instead of stockings (1944) – they’re the ones I like!
  • Tweeted my Datablog post on animal testing from yesterday, and a sidebar on musicians in politics by one of my colleagues that is on page 3 today (we’re @guardianlibrary)

Lunch – managed to hold out until 2 so the afternoon goes quickly!

  • The reader’s editor wanted to track down a copy of the 1945 Potsdam agreement, to check whether it was signed by Churchill or Attlee. Lots of web and encyclopedia digging later, I found an image on the PBS website
  • Looking for articles on children living in circus communities, or on the margins of society. Beautifully random, if tricky to pin down
  • Writing a quick sidebar for the foreign pages on hot political memoirs, let’s hope they print this one

That’s me done for the week (I’m only a three-day part timer). Looking forward to reading everyone else’s logs!

Library Day in the Life: Tuesday

Assistant librarian, Guardian News & Media (media library)

- Library Day in the Life project

Tuesday is brownie day in the Guardian library, always good to come in to baked goods!

  • Added the David Cameron timeline I compiled yesterday to the department intranet – now I just have to remember to keep updating it
  • Attended morning conference, which is open to everyone – David Miliband was the special guest
  • Tried contacting the Olympic Delivery Authority press office to get hold of venue floor plans, but their phone is permanently on voicemail, grr
  • Tried to track down a date of birth for tomorrow’s obituary pages, with no luck. It’s going to be one of those frustrating days, I can tell
  • Created a Labour leadership page on our intranet, with profile info for each candidate (Who’s Who entry, Guardian profile & news page etc), a timeline of the leadership race and links to relevant online resources (Labour Party site, Unions Together, Guardian microsite, Guido Fawkes)

Lunch beckons, before I eat all the chocolate brownies.

  • Finally got hold of the ODA press officer – they’re on site all day as it’s only two years to go (you’d think they’d expect calls from the press today, ho hum) but I’ll be able to speak to them tomorrow. So sort of successful
  • Added a nifty searchable map of UK libraries by postcode (created by @psychemedia) to our Delicious account
  • Wrote a new Datablog post, on the cheery subject of animal testing. Now I know that mice are most likely to be tested on, but nearly 400,000 fish were tested on last year too. Not sure when that will ever come in useful!
  • Added the new GDP figure from the ONS to the Datablog post on GDP