About KatyStoddard

30-something media librarian whose new year's resolutions include CILIP chartership and regular blogging, with a little bit of tweeting in between.

#Libday8 Day two (31 January)

Library Day in the Life round 8

Senior researcher, Guardian News & Media (news library)

  • 8.45am: Caught up on Twitter on the bus in to work, and read a couple of really interesting #libday8 posts and articles (see below for links)
  • 9.30am: I had a journalist query waiting in my inbox when I logged on –  looking for a quote from Hansard (the House of Commons record). There’s an easy-to-use archive on the parliament.uk website which goes back to 1988, and there was a possible date (we like!) but it wasn’t the right one. I searched by MP on Hansard but couldn’t find the quote for that year’s session, so I checked our text archive and Google for the article referenced in the email, to see if I could gather any more info. This led me to a new date (1994), but you can’t search that far back by MP so I headed to the advanced search instead. I should have tried there first! Nothing came up when I searched for the keywords. I even whacked the quote into Google but no joy there either. I’ve asked the journalist for more information. And it seemed so simple…
  • 10.30am: Our trainee maintains a spreadsheet of casualty figures from Afghanistan which feeds into a Datablog post on British dead and wounded. I’m responsible for updating the running total in the article and relaunching it with an amended table when wounded figures are released. The spreadsheet also includes amputation figures, released quarterly by the MoD through DASA. They’re out today, and normally our trainee Nina will just copy over the new figure and tot up the annual total, but the MoD have changed their methodology so now they’re reporting amputations by financial year. This, and the fact that they don’t report quarterly figures less than five, means our annual totals no longer tally with the official MoD numbers. I’m loathe to switch to financial year (“there were xx amputations in 2010″ is nicer journalistically than “there were xx amputations in 2010/11″) but as it stands our annual data is incomplete. I’ve added an explanatory note to the spreadsheet but I’m going to consult the Datablog editor to decide whether we should switch to the financial year rather than Jan-Dec.
  • 11am: Still working on the Hansard query! The journalist has provided the source of the quote (a pressure group report), so I’ve got more to go on – it seems the quote may not have been said in the House. I checked Google in the first instance, and the writer thought it may have been from the Diana inquest but I can’t find it. I’ve admitted defeat and contacted the House of Commons information office.
  • 12pm: We’ve decided to include both the Jan-Dec and financial year totals for amputations in the Datablog figures, so Nina and I amended the spreadsheet and added a note to explain discrepancies.
  • 12.15pm: Amended the table of wounded data attached to the Datablog post on British casualties and relaunched the article.
  • 1pm: My job share, Lauren, worked on a crowdsourcing Datablog post about people who have refused honours a few weeks ago, and a reader has emailed with some new information, so I added it to the spreadsheet.
  • 1.30pm: Proper lunch break – there’s a charity cake sale on by the canteen today so I treated myself to a slice of cherry loaf, yum.
  • 2.30pm: Panic over regarding the missing quote, the journalist has located it himself (a bit embarrassing!).
  • 2.30pm: Working on From the archive, the ‘on this day’ series that we publish online (with a piece printed in Saturday’s Guardian comment pages). Found two too-long pieces for the end of Feb (Mickey Mouse in 1935 and trials of using computers in 1986) which need trimming.
  • 4pm: Journalist query – tracking down a Guardian article from the 1990s – pulled together a few possibles from the text archive and Factiva.
  • 4.30pm: Graphics wanted some help finding military stats for an interactive, so I borrowed the Military Balance and photocopied the relevant pages. A bit tedious but much preferable to typing it all out!
  • 5.25pm: A quick one to finish – a PDF of an Observer spread from a fortnight ago (from the text archive).
  • 5.30pm: Leaving on time for the first time in a long time. Sounds like a country song.

Read today: The engine of serendipity, a 2006 post from Nicholas Carr’s Rough Type blog (via @lilianedwards); Jonathan Franzen : e-books are damaging society in the Telegraph; Save Our Libraries campaign one year on from the Guardian Books blog (via @SimonXIX); Library Day in the Life day one by Nicole Brock at Odd Librarian Out; Library Day in the Life part 2 by Tina Reynolds.

#Libday8: Day one (30 January)

Mac computer screen in the Guardian library

The view from my desk

Library Day in the Life Round 8

Senior researcher, Guardian News & Media (news library)

  • 9am: Check Twitter on the bus on the way in to work, to find out what’s in the news this morning (I need to be aware of what’s going on, and it gives me an idea of what I might be asked to work on later). Work doesn’t begin when you enter the office any more!
  • 9.45am: Deleted last week’s emails – I don’t work on Thursdays or Fridays so my inbox fills up with old queries.
  • 10am: Picked up an emailed query for a land registry search, left over the weekend, and emailed the team to say I’m doing it (that might seem redundant – there are only six of us and we all sit together – but we work different shifts and my job share won’t pick up the email until Thursday, so it pays to attach a name to every job).
  • 10.15am: Ran the land registry search – I haven’t used it much so it was good to get a bit of practice – and emailed results to the journalist.
  • 10.20-11am: I’ve been running a Guardian Datablog post on the film awards season, something I initiated a few years ago, using a Google spreadsheet to track all the key nominees and winners leading to the Oscars in February. This weekend was the Directors Guild and Screen Actors Guild awards, so I added the winners to the spreadsheet and added a para about them to the article (the DGA winner usually wins the Oscar). The post has tables for best actor and actress attached, and I edited these so that the winners stood out in bold. I also created a new table, for best director nominees. Then I relaunched the post (with today’s date) and told the web team about it so they can attach the story to any relevant content.
  • 10.30am: Michel Hazanavicius doesn’t have a keyword on the website yet, so I emailed our keyword manager to request one. He’s on it!
  • 11am: Quick department meeting to discuss jobs coming up this week.
  • 12pm: It’s the 40th anniversary of Bloody Sunday so I wrote a quick blogpost for From the archive, with Guardian coverage from the time. Once I’d prepped it I ran it past a colleague and our SEO team, to sub it and improve my SEO!
  • 1pm: Checked the text for tomorrow’s From the archive piece against the original, to weed out any stray commas or spelling mistakes. This is one of the tasks that is rota’d each week.
  • 1.15pm: Quick lunch! At my desk, because I’m going to a staff briefing at half one. I’ll try and sneak some time away from the screen later.
  • 1.20pm: The SEO team are happy with my Bloody Sunday post so I launched it, then tweeted the link from our department account (@guardianlibrary).
  • 1.30pm: Company-wide meeting about an upcoming Guardian event.
  • 3pm: Took a query from a journalist about MPs prosecuted following the expenses scandal (we get most queries via phone or email but this one sidled up to my desk, which I like). I checked for each of the MPs on our text archive of national papers, and on Factiva (to catch regional coverage), and sent articles via email.
  • 4.15pm: Journalist query for background on Seb Coe, via phone – Factiva again.
  • 5pm: I checked over the From the archive pieces for Wednesday and Thursday while I had a bit of spare time.

Kept up with a few articles via Twitter today – Beyond books: what it takes to be a 21st century librarian and Online newspaper metrics? The grey lady doth protest too much, methinks. I keep Twitter running in the background  and check in periodically to keep up to date, don’t want to miss anything!

Reading: The role of a 21st century librarian

Great post from Emma Cragg and Katie Birkwood on what it takes to be a 21st century librarian, on the Guardian careers site (published a year ago), that I stumbled across today.

In all library roles customer service and communication skills are important. If anyone ever thought they’d become a librarian because they liked books or reading, they would be sorely disappointed if they did not also like people too.

So true! So much of the role is communicating the information you find to others.

Working week, 23-25 January 2012

  • Generating a Wordle from a hashtag: Education asked on Tuesday if we could create a word cloud  from the questions asked on Twitter using the #askgove hashtag. See my post for more info on how I prepped it (and why it didn’t run).
  • Awards nominations: The Oscar nominations were announced on Tuesday, streamed live on the web. I prepped the article first thing, added the nominations to our spreadsheet as they came in then amended the article before publishing (controversially no nomination for Tilda Swinton!). My computer decided to remove itself from the server ten minutes before the announcement, cue much gnashing of teeth, but luckily the old ‘turn it off then turn it on again’ trick worked. I had a bit of a struggle creating summary tables for the page (thanks Critics’ Choice for nominating six best actors) but Ami showed me a nifty way of narrowing the columns. And I forgot to change the date (so that the article jumps to the top of the Datablog list), but handily someone else noticed and fixed it! Quite exciting launching a story in real time, but lessons to be learned about paying attention to the little details.
  • Pre-emptive blogging: I’m trying to work on a few blogposts in advance, so I’ve been looking for content on the Queen’s succession and Valentine’s day.
  • Journalist queries included recent comment on the health and social care bill, a 1994 article from Modern Law Review (luckily one of the free online ones), writing bulletpoints on some Olympic sports, corrections, polls on satisfaction with the NHS, a comparative health report, profiles of Aki Kaurismaki, a fact check on Woody Harrelson and background on the judiciary system (who heads it, who regulates it, law schools).

Generating a word cloud (or not) from a Twitter hashtag

Word cloud showing most common questions under #askgove

Sample #askgove word cloud created from around 2,500 tweets

Education asked last Tuesday if we could create a word cloud on Friday from the questions asked on Twitter using the #askgove hashtag. One of those jobs that seems simple on the surface but isn’t!

  • Problem one – by Tuesday there were already thousands of tweets, and Twitter will only allow you to search so far back on a keyword.
  • Problem two – they wanted the cloud generated on Friday (when they go to print) so they could include as many #askgove questions as possible, which meant checking for new tweets every couple of hours during the week to compile an immense list.
  • Problem three – because there were so many tweets, it was impossible to go through and weed out all the extraneous words like reply, retweet, favorite, open, askgove before generating the cloud, to say nothing of all the stop words (and, a, the…). They wanted a cloud that highlighted the key questions being asked, so no words relating to usernames, no why/will/what/when… and sadly no swearing!
  • Problem four – I don’t work on Fridays.

I got as far as I could with it – I searched for #askgove on Twitter and pasted the available list of tweets so far into a program called word counter, to generate a list of words ranked by frequency. That weeded out some of the basic stop words. But how to turn that into a Wordle? I could see the most popular terms, but they only occur once in the text generated by the counter so the word cloud would be meaningless.

Step forward production, specifically a systems editor, who showed me a nifty bit of code which takes the word counter list and returns each word, repeated as many times as the frequency number next to it. Weed out the words we don’t want (check the ones we’re not sure about – ebacc, ict, hei – on Twitter), paste this into Wordle and voila! a word cloud.

I showed the process to the art director who works on Education, and mocked up a word cloud using the layout and colours she chose, to see whether it worked on the page.

I wrote detailed instructions for colleagues, and at their request I talked them through the process at my screen, so they could create the cloud without too many difficulties. They started to add to the list of tweets at the end of Wednesday (while I was still in, to check they’d got the process right).

And then…

…the word cloud was dropped from the supplement. This happens fairly often in journalism – a story is superceded by breaking news, the space is needed for advertising or a better alternative presents itself. The reason in this case was space – the word cloud simply didn’t work in the space available on the page. And they let us know early on Thursday, so my colleagues didn’t spend too long on it (sometimes we don’t get told at all).

So was it a waste of time? No. I learnt some valuable lessons, about how to generate word clouds but also about working with different departments (and colleagues) to create something for the paper.

Reflections

  • If something seems impossible at first glance don’t just dismiss it, there’s usually a solution and sometimes you have to put a bit of work in.
  • Ask for help if you don’t know how to do something – in such a big organisation there will usually be someone in the building who has the knowhow.
  • Collaboration is key – education came to us at the beginning with a clear idea of what they wanted but little knowledge of how it could be done; I took it as far as possible then consulted someone with the technical knowledge; and collaborated on the design so the editors could make a final decision. Sharing knowledge led to a better end result, even though it wasn’t used.
  • Now I know how to create a word cloud from any volume of text, so if it comes up again it’ll be easy (she says…).
  • Walking colleagues through a complicated process is better than just emailing a list of instructions, which can be confusing (some people learn better with visual aids) and can seem a little superior (not everyone responds well to being told what to do remotely).

I think that last one is the lesson I should really take to heart!

#libday8: Under starter’s orders

Library Day in the Life round 8 starts today – I’ll be posting at the end of each day and tweeting throughout (@katy_bird). Looking forward to reading everyone else’s exploits!

Working week, 16-18 January 2012

  • Working on plans for Olympics coverage: We’ve been chatting this week about how we can cover the London 2012 Olympics from an archive perspective. We’ll be blogging some archive stuff, and tweeting too, hopefully with coverage from previous London Olympics. We’re trying to initiate our own projects, rather than being approached by others all the time – it’s much better to be involved from the start so we can be realistic about what is achievable (learning from past mistakes!).
  • Wikipedia blackout: We didn’t see a massive influx of queries on Wednesday, when Wikipedia was blacked out for 24 hours to protest Sopa.  Optimists would say that’s because our journalists are above using Wikipedia, but it’s more likely that they’d figured out ways around the blackout. Our encyclopaedias made a star turn for Guardipedia, when Patrick Kingsley fielded questions from readers stumped by the blackout. Shame there was no mention of the librarians (and lots of library clichés!), but he did give us a shout out on Twitter.
  • Journalist queries included a 1996 article on the Olympics, Syria in numbers, recent social stories on China, examples for a panel on home experiments gone wrong, interviews and reviews for Russell Tovey and Jaime Winstone, net % change of GDP over time, MP quotes on the Work Programme and a land registry search.

 

Changes to From the archive

Last week the Guardian underwent a modest restyling, with several pages stripped back. As a consequence, our From the archive column will no longer appear in the print version of the paper (except on Saturdays), but we will still be posting it online.

While there’s more caché to having a column in the paper, there are advantages to working web-only.

  • The word-limit isn’t as restrictive, so we won’t need to edit a good piece down to 480 words, or tack on an unrelated article if it’s too short (although we don’t want to start posting 1,500-word essays either).
  • We can play around with the format, using strong graphics or images if we find them instead of text.

There’s some extra work involved in uploading articles straight to the web though.

  • The pieces don’t run past a sub-editor, so we need to pay more attention to the copy, comparing it with the original article to make sure there are no missing words or stray commas.
  • Sometimes we’ll have to write our own headlines, where the original doesn’t have one or has a poor one (19th century articles tend to be wordy).

I worked on the first batch of web-only articles last week and found a few errors in the texts. We’ve changed the rota to take that into account, so one person preps the article for uploading and someone else subs it, so hopefully we’ll catch most of the mistakes before we launch! I’ll be paying more attention to it when I find articles from now on, too.