Training: Tweetdeck, 22 November 2011

I’ve been using Twitter for a few years but I’m behind the times with my software! Following a talk about the use of Twitter, I signed up for a Tweetdeck intro session with John Stuttle, one of the Guardian’s systems editors.

Tweetdeck offers much more usability than the basic Twitter feed. Key for me is that it allows you to track more than one account at once (and from more than one social network), to tweet from more than one account simultaneously (say my personal and department ones) and to publish timed tweets (so we could set a tweet to launch the weekend’s From the archive in advance, for example).

John recommended signing up for a account as well, which allows you to analyse the statistics on how many people used your link to click through to a story, versus other links, and use that to improve your tweeting. provides various other stats too – to view statistics and graphs just click the Analyze link at the top once you’re signed in, or click on Info Page next to an individual link.


Marilyn Johnson, Brewster Kahle and the risks of leaping into digital with both feet

I’ve been out of the loop for a couple of weeks, camping in the not-quite-wilds of Northumberland. It’s strange being so disconnected from the web (I didn’t even have mobile reception for a lot of the time); it’s made me realise how much I rely on the internet to stay connected, to people, to the news, to the industry.

Having time away also meant I finished reading This Book Is Overdue! How Librarians And Cybrarians Can Save Us All, by Marilyn Johnson. So even though I had two weeks internet-free, I spent it reading about digital librarians!

Marilyn Johnson’s book, which I’ll review separately, takes on new meaning in light of Seth Godin’s article The future of the library (and apologies if the debate has moved on while I’ve been away!).

Godin’s central argument was that fusty old librarians need to ditch the paper and move into the digital sphere. Johnson’s book provides ample evidence that librarians have been working online for decades (OCLC, the Online Computer Library Center, was founded in 1967, when the web was a mere twinkle in Tim Berners-Lee’s eye).

It also raises the risk of libraries leaping into digital without considering the ramifications for non-digitised, specialised collections, which can have funding and space cut, or be lost entirely.

A similar issue was raised this week by Brewster Kahle on the Internet Archive blog:

A reason to preserve the physical book that has been digitized is that it is the authentic and original version that can be used as a reference in the future. If there is ever a controversy about  the digital version, the original can be examined. A seed bank such as the Svalbard Global Seed Vault is seen as an authoritative and safe version of crops we are growing. Saving physical copies of digitized books might at least be seen in a similar light as an authoritative and safe copy that may be called upon in the future.

I was involved with digitising the Guardian and Observer a few years ago, creating a fantastic online resource of newspaper articles dating back to 1791 that would otherwise not be widely accessible. But we would never have considered scrapping the bound originals, or even the microfilm copies, once digitisation was complete.

Digitising books and library collections is an important step forward, but until a system is designed that is 100% reliable, not open to corruption or human error, and with a long-term shelf life, it would be madness to do away with paper collections altogether.  

Websites: LinkedIn

I’ve spent the evening doing the online equivalent of housework – tidying up my blog (the blogroll on the right was horribly out of date), and getting my online house in order.

Part of that involved sorting out my LinkedIn profile. I joined a while ago, but I’ve only been checking it every couple of months. I need to make it one of my go-to places on the web, alongside my email, Twitter, Flickr, the blogs I check regularly and, I admit it, Facebook too!

So I’ve updated and added to my profile, connected to a few of the most obvious colleagues and friends, and also joined the LIKE group (pending approval!). Hopefully it will become a key way of engaging with the information profession at large. If you’re on LinkedIn, add me to your network.

Guardian 190: From the archive

I can’t claim any credit for this because I’m on leave, but I was involved in pushing for the From the Archive blog initially so I’m a little bit proud!

The Guardian is celebrating its 190th birthday this month, and has pulled together a bundle of resources, including a rather nifty interactive showing 190 key moments in the Guardian’s development.

As part of that, the research department are blogging an article from each year – in order – on their blog From the Archive. I’m a bit late in highlighting it – they’ve already reached 1896 – but there’s plenty more to come, and you can access the back catalogue on the blog or through the main Guardian 190 microsite.

Chartership resources: Using Delicious to track reading

I’ve been puzzling over ways to record all the articles I read and sites I look at during the Chartership process (yes, I’m avoiding my CV writing!).

The simple solution would be to list them on a separate blog page above, like the Web Work page I use to record my writing and research pieces. There’d be no way to categorise them though, no way of adding keywords, and judging by the number of blogposts I’ve already scanned today it’s going to be a long, long list!

Instead, I’ve set up a Delicious account specifically for Chartership reading. We have a list at work, although it’s always been underused. It’s a great way of keeping track of useful resources in the age of information overload, as well as sharing links with others.

As well as keeping a record of my reading list, it also means I won’t have to blog about everything, just the articles and resources I’ve found particularly useful, and which I can apply to my day to day role.

I’m trying to select a list of tags that will help me categorise links without getting completely out of hand!

On the web: DocumentCloud

The Christian Science Monitor librarian (@CSMLibrary) flagged up a blogpost about DocumentCloud from the NewsliBlog today.

DocumentCloud is a free online tool for converting PDF documents into web text that you can annotate, making the contents much more accessible and useable. News organisations in particular have used it to provide readers with official documents that add value to a news story.

As Derek Willis (@derekwillis) says in his blogpost:

It’s a great way to maintain a set of files that anyone from the newsroom can access and annotate, making it a good candidate for long-term project work. And when you’re reading to show that work to the world, you can make any or all of the files public.

I’ve not had time to play around with it, but I’ve struggled in the past with ways of stripping text from a PDF quickly and without too many errors, so anything that helps speed up the process can only be a good thing!

I can think of several time-consuming queries it would have helped with off the top of my head – editing From the Archive articles that predate our text archive (anything pre-1984), taking content for the Datablog from government documents that are only released in PDF or the Russian spies story, for starters.

On the web: data visualisation:

Something we get asked for fairly regularly in the news library is a size comparison – some journalists like to be able to equate a distance or area in a story to a recognisable place in the UK (recent examples include a piece on nature reserves “covering an area the size of the west Midlands” and a reference to British manoeuvres in Sangin, Afghanistan “to capture an area the size of the Isle of Wight”).

There is a questionmark over the validity of such comparisons – how much value does it really add to a story, and how many people have a strong enough grasp of geography to be able to visualise even UK areas? But they show no sign of dropping out of use.

A new online tool developed by the BBC could help media librarians, journalists and readers to draw more relevant, and useful, comparisons in future. BBC Dimensions, found at, takes a template of a newsworthy event (for example the area afffected by the floods in Pakistan, the Twin Towers or the BP oil spill) and lays it over a Google map at a location of your choice.

Dimensions is a prototype and it has its limits – you can’t create a new template so the event you’re covering has to be listed already; you can’t play with the shape of the template so you need a reasonable amount of spacial awareness to be able to compare it to specific areas like counties or countries; and, perhaps most worryingly, the disclaimer at the bottom states, “We make no guarantee as to its accuracy, reliability or performance” – but it is a pretty good starting point for queries of comparison, and it’s a nifty way of using data visualisation to add value to a news story.

The new Times website – commentary from elsewhere

There have been some interesting comments in  the blogpshere today on the new Times website and its upcoming paywall.

The Guardian’s Organgrinder blog thinks the design lacks some imagination, mimicking as it does a braodsheet newspaper, but is impressed with the functionality and digital tools:

Overall, the Times site makes a very accomplished effort at bringing the style and symbolism of the paper on to a screen without sacrificing the breadth and depth of information that readers expect from web pages, and the Sunday Times one is an interesting stab at displaying the vast wealth and diversity of all those supplements. Whether they’re beautiful enough to pay for is, of course, another issue.

Over at the Thoroughly Good blog the mood is more contemplative, pondering the paywall model and musing about the attractiveness of reading newspapers on the iPad on the way to work, before reaching the conclusion that it’s all part of a media conspiracy:

…the Times paywall isn’t just about reeducating a generation, redefining the rules of the internet or necessarily about investing money back into quality journalism.It might be all of these things. Who knows.

But first and foremost this is about one hand washing the other. Two – or maybe three – media giants riding on each other’s coat tails.

Suddenly ignorance is a considerably more attractive option. Because as grumpy a nearly-middle-aged man as I am, I am not about to commit even more money I haven’t got to a device I don’t need just because I’m swayed by how lovely a redesigned website will look on that new device.

Adam Tinworth meanwhile argues that while News International may lose traffic and exclude many from commenting on pieces, for those who subscribe the benefits of joining an exclusive online community may be great:

…what News International are actually trying to create is, in essence, a private members’ club. There will be a limited number of people joining in on discussion, largely around content. People sharing what they think will be identifiable, and they will have paid an entrance fee to get in there. This is, in fact, a community model, just one that differs from the wide, inter-connected community model we’re used to on the open web.

Now, if this is what The Times is attempting, it’s a very interesting experiment, and one that I’ll be watching with a great deal of interest.
We’ll just have to wait until the paywall is in full effect to see whether these ideas ring true.

Resources: Eurostat

A representative of Eurostat‘s media support team paid a visit to the office to talk us through their statistics database. It can be hard sometimes to locate specific stats on an unfamiliar website, particularly if you are up against a tight deadline, so it was good to get an overview of the type of data stored and the different ways you can search Eurostat.

  • Eurostat is the central store for EU and EFTA countries’ stats, the central institute of the European Statistics System
  • Data from NSIs go to Eurostat, are harmonised, they compile Euro aggregates, then disseminate (mainly web only)
  • Euro Indicators are regularly released economic indicators – they always cover the EU and sometimes extend further if comparable stats are available – unemployment, GDP, trade, inflation etc – deficit and debt released every April and Oct
  • Also release ad hocs as and when – GDP per capita, population, tax trends, the yearbook, for Women’s Day etc
  • Don’t cover Eurobarometer surveys (DG Comm), EU budget figures (DG Budget, Inforegio for regional figs), tax rates (DG Taxud – who has highest/lowest VAT etc)
  • Their remit is to report stats independently and neutrally, without a political agenda
  • Release calendar is issued every Oct
  • Time series usually go back to the 1990s across all countries, as that’s when the data was harmonised, but goes much further for some countries
  • DG Ecfin has time series data for some economic indicators but not complete as not comparable
  • Microdata isn’t on the site – could be used to identify individuals eg surveys; can get access for research purposes only

To use the site:

  • Country profiles (from home page) – compare a country to EU ave etc;  only most recent data; can then link to table of data to download
  • Data in Focus (from home page) – online PDF releases of topics of data; links to data files
  • Stats in Focus (from home page) – text, analysis as well as numbers; links to data files
  • Statistics tab - organised by themes and sub-themes; click on a theme, links down the side give related datasets; click through the hierarchy
  • Search – the homepage and the whole site – can search publications, datasets or metadata – or just search in the database

To use the Statistics Database:

  • go into pre-defined tables, click on topic, work through the hierarchy to get a table of data, access to maps, graphs, can set parameters and download
  • go into database, use the data tool to extract data; can set own parameters, display format etc

Online: Ghostsigns archive

I have so many posts to catch up on! On 18th March I attended the launch of the Ghostsigns archive, overseen by project manager Sam Roberts and created as part of the History of Advertising Trust collection (more of which later).

A ghostsign, for the uninitiated, is painted advertising on a building, the forerunner of billboards that has all but died out. I became involved with the Ghostsigns project last year, when Sam Roberts contacted me on Flickr to invite me to add a photograph of a sign to his group on the photography site. HAT has enabled Sam to build up the collection from a group on a public website (which now has 450 members and 4,058 photos) to a searchable online archive that will preserve the images, which form an important part of advertising history, permanently. My photo incidentally wound up on a postcard to celebrate the launch, which was quite satisfying!

To coincide with the launch I wrote a Datablog post providing metadata for each of the signs in the archive – location information, including county and partial postcodes, and image links and descriptions for an initial set of 30 signs. The plan is for Sam or someone on the project to be able to add further image links and descriptions as time allows.

I think there are two valuable lessons from my involvement in Ghostsigns:

  • firstly, learning the role a web 2.0/social media website can play in creating an archive. The Ghostsigns project wouldn’t be as far-reaching if it weren’t for the input of the Flickr group’s members, and Sam would never have been able to reach so many other enthusiasts without it. Librarians and archivists shouldn’t just jump on the social media bandwagon at random, but in certain circumstances opening up the acquisitions process to the web, or using sites like Flickr, Twitter or Facebook to reach a new and specific audience, can really pay off.
  • secondly, that the Datablog can be used to serve a purpose beyond the general reporting of news. Yes, the dataset created in partnership with the Ghostsigns archive increases traffic to the Guardian site, and provides raw data for Guardian users to manipulate or use as they please. But it also provides HAT and the Ghostsigns project with a useful tool that can be developed by them in future.