Reading: February’s CILIP Update

Initial thoughts on the latest issue of Update (reflections and more ordered thoughts will follow next week).

  • Shift to big data faces skills shortage, p7 – survey of big data community shows 3/4 felt there weren’t enough skilled workers in the UK; “‘when you’re on the cutting edge of technology, you have to be teaching yourself most of the time’” – Manu Marchal, Acunu Director; “8 out of 10 said that on-the-job training was the best way to ensure skills were up-to-date”; “significant majority [70%] felt there was a knowledge gap between big data analysts or managers and decision makers” – knowledge gap because “‘technology is constantly evolving, so management, like practitioners, are often not aware what can be achieved with these technologies.’ [Manu Marchal]” (applicable to any technological advance).
  • Backlash against volunteer report, p9 – “Without skilled staff a library is a shadow of its former self.” – Phil Bradley
  • Information matters, p13 – “As library and information professionals, we each have a vital role promoting the best effective, ethical, legal and literate use of data and information.” – Peter Griffiths
  • Global course to improve information literacy, p14 – Unesco course to teach importance of media and information literacy to educators. “We live in a world where the quality of information we receive largely determines our choices and ensuing actions, including our capacity to enjoy fundamental freedoms and the ability for self-determination and development.” – Janis Karklins
  • Copyright changes face challenge, p17 – new copyright proposals to make it easier to digitise content for preservation, which is good for archives BUT news agencies and media archives opposing the moves because it risks their ability to monetise archive content and allows for organisations with a license to exploit content without prior consent. Should we be protective of Guardian copyright or happy that more content can be preserved, even if it bypasses our exclusivity?
  • VP’s column, p18 – some great ideas for engaging teens with reading (check out Excelsior Award)
  • E-books: finding the way forward, p33 – Christopher Platt from NYPL highlights how important it is in current climate to collaborate “understanding where publishers and content producers are coming from is crucial to finding a way forward” – in this case, on the issue of e-books, but applies to any sticking point with other departments
  • Moving up the value chain, p39 – Laura Woods column – “Librarians are skilled at taking complex information, synthesising it and representing it in a manageable format for a variety of audiences.” Vital to proactively seek out new roles and specialising, becoming the “go-to person”, making yourself invaluable to your company. But it is also vital to outsource menial tasks that take time away from more specialised jobs – “Reframing what we do is crucial to ensuring that we have a future as a profession. I believe this is true of librarians in every sector. To prove our value, the first step is ensuring that every job we do adds value. Cutting out as many low-value jobs as we can allows us to move further up the value chain.”
  • Informed advocates – becoming agents for change, p46 – again, increased collaboration is important. It’s also important to value ourselves more – we aren’t “service providers” to more important staff, we are just as qualified and professional and should approach business relationships as equals – “Personally, I’d rather people didn’t ‘use’ me. When I hold at least as many academic and professional qualifications as those I work with, collaboration between equals is what we need to advocate for. It’s not about ‘support’ either – it’s about being part of a team and ensuring that the skills I posess are clearly recognised and seen as essential to the work of a team. Is it sensible to go on describing the work we do as a ‘service’ when we are seeing ‘services’ being outsourced?” – Bernard Barrett

Training: searching statistics on ons.gov.uk

Image

5 February 2012, CILIP HQ (organised by CILIP Information Services Group)

Notes on the day

Geoff Davies, Implementation Manager at the ONS, gave a run-through of the navigation of the newly redesigned ons.gov.uk. Recent improvements include new search functionality, additional synonyms and acronyms and better navigation.

  • Several new elements on the homepage will be useful for headline figures – the “carousel” in the centre which announces the latest big releases, and the Key figures panel on the right which is a quick way of accessing the most up-to-date stats for GDP, unemployment etc.
  • The UK Publication Hub (link at bottom of landing page) holds all government data, not just that held by ONS.
  • ONS YouTube videos give explanations of big releases, and the new interactives are a good way of interrogating data.
  • Links to the previous site are obsolete, so if you’ve saved a URL it won’t redirect to the new site, but all the statistical releases have been carried over, so they will be there if you dig deep enough.

Geoff then outlined the basic structure of the ONS site, which is a simple nested hierarchy:

  • Business area (section) folder -> each publication has a folder -> calendar entry for each edition -> edition folder -> all content “nuggets” released on that date eg. charts, data tables, summary, statistical bulletin etc.
  • Every edition published to the site has a separate release page, which goes live on the publication date (the release calendar includes future publications). Everything relating to that release is accessible from the page – datasets and reference tables are listed at the bottom of the page, and contact details for a named person responsible for that release are to the right.
  • The redesigned theme pages, which are launching shortly and will be rolled out gradually across each theme, are simplified and easier to understand, and much more visual than the current text-based version. A moving carousel, in the centre, gives the most recent data. They are a work in progress and will be improved as more pages are updated.

Geoff gave a quick run-through of the navigation tabs across the top of the site:

  • Browse by theme – alphabetical index of themes -> individual theme pages, with the most relevant or important content at the top.
  • Publications – chronological list, with filters on the right to narrow down content.
  • Data – chronological list, search for datasets and reference tables here (not available in publications list).
  • Release calendar – all releases, chronologically, including future releases (the landing page only includes big releases). If you click through to a release page there’s a link to all editions at top right, to access previous data.
  • Guidance and methodology – gives background on the ONS and data collection, classifications etc.
  • Media Centre – includes official statements and releases, and letters correcting misinterpretations of stats in the media.
  • About ONS – most useful is the ad hoc research undertaken by ONS, which isn’t searchable in the publications indexes. Go to Publication Scheme under What We Do, then Published Ad Hoc Data on the left.

Continuing problems with the site

The main issue users have raised since the redesign is difficulty in finding content. The ONS has decentralised publishing, which means each department is responsible for their own releases (around 460 staff contributing to the site). This has led to inconsistency, as some staff are reluctant to change old methods or not interested in web standards, and some are just too busy. The ONS are working on solutions:

  • training staff on how to tag content with six or seven most useful keywords (too few, or too many irrelevant ones, mean weaker search results), and improving the metadata.
  • publishing support team to help departments who are too busy or uninterested.
  • health checks are run on content regularly.
  • there is pressure from management to conform to the new standards.

Practical examples

We ran through some real search queries for tips on searching the site, with assistance from a member of the customer services team (whose name I missed, sorry!). The main advice was to search through the release calendar using filters as necessary (selecting ‘last 5 years’ clears future releases from the list), and to use the ‘all editions’ link on each release page to locate time series data.

Unfortunately, the practical examples just proved that the search functionality of the site still needs improvement (if a roomful of information professionals struggles to find data you have a problem!). Advising users to call the customer services team with any queries is helpful but no use in a high pressure environment where data is needed within hours, not days – what I really needed were ways of finding the stats myself.

Reflections

  • The redesigned ons.gov.uk site is much cleaner and simpler than the old version, and easier to navigate, but it’s still difficult to actually find specific data. It’s a shame the ONS didn’t take advantage of having a room full of information professionals to interrogate the system further and to make notes of improvements needed.
  • Some of the problems the ONS are facing are familiar – they’ve decentralised uploading of content, but some staff are reluctant to adopt new techniques and others are over-keen and tag excessively. This is true of other new technologies being adopted across many library sectors (certainly it applies to social media in the news industry). It’s an issue of good training and perseverance with the new standards, and having support from management is vital.
  • Some issues with the redesign are similar to those we’ve experienced in relaunching our intranet recently – lack of redirects from old pages, decentralising, need for training.

Applying what I learned

  • The key figures and carousel on the front page of ons.gov.uk will be incredibly useful for finding the most recent headline data quickly (a common query).
  • The new theme pages will be very useful once they are launched, as a quick way to access key figures on a topic (another common query).
  • I’ll bookmark the ad hoc data page as an extra location to check for data.
  • The training also offered some good ideas on how to ensure consistently good content and metadata, which we could apply to any new roles that our department undertakes.

#chartership chat on Twitter – the evaluative statement, 29 March 2012

Date and time: 29 March, 6.30pm-7.30pm BST
Topic: the evaluative statement
Participants: 14
Tweets: about 160

I tuned in to the first #chartership chat on Twitter last month but I missed the last two so it was great to get involved again. Tonight’s theme was the evaluative statement, and while there weren’t as many of us present this week (@joeyanne was missing and we were competing with the hottest day of the year, as @cjclib pointed out!), @tinamreynolds did a great job keeping the conversation going and the discussion was just as fast-flowing and crammed with useful tips.

@joeyanne will be archiving the tweets as usual and you can still read write-ups of the previous #chartership chats. The next chat is on 12 April at 6.30pm BST, and the theme will be the mentor/mentee relationship.

16 February, #Chartership chat on Twitter blogpost by @joeyanne
Storify on #chartership chat by @ellyob
1 March, Chatting about Chartership blogpost by @el399
17 March, collecting and reporting evidence blogpost by @Library_Quine

I’ve taken the approach of previous bloggers and tried to pull out the main areas we discussed, as well as the best practical tips for writing your statement.

Planning and drafting the evaluative statement

There was some debate over when to start thinking about the evaluative statement. @Misteemog collected a mass of evidence first, drew it together in the statement, writing about each item, then “pruned the best bits”. @Readyourbook suggested arranging your evidence into a coherent order first, then writing a sentence about each piece of evidence as a first draft:

@Readyourbook on drafting statement

Some said they’d been thinking about the evaluative statement while they were collecting evidence – @ellyob jots down ideas for the statement under each of her chartership objectives, alongside possible evidence. @tinamreynolds and @Readyourbook both wished they’d started thinking about the statement at an earlier stage. Whether or not you draft your statement as you’re going along, we all agreed it was advisable to think about the criteria as well as your PPDP goals while you’re gathering evidence.

@Readyourbook pointed out that drafting the statement can help you to focus on reflection, and to weed out evidence that doesn’t add anything to your portfolio.

What to focus on

I’ve reached the point where I want to draft my evaluative statement, but I’m struggling to set out a framework, so it was great to get some input from other chartershippers (charterers? Hmm).

@AnabelMarsh “parachuted in” to pass on some advice from a recent chartership meeting she attended. The assessors she met prefer the statement to be based on the criteria, because it makes their job easier, though they’re not opposed to other approaches. @annetteearl followed this framework, writing 250 words on each of the four criteria.

We agreed it didn’t matter if your evidence applied to more than one of the criteria – they’re bound to cross over (most activities, as @tinamreynolds pointed out, will count as commitment to CPD), and applying to multiple criteria can strengthen your evidence – although it may make writing the statement harder!

If there isn’t room for in-depth reflection in the statement, where else can you put it?

The statement is limited to 1,000 words, so there isn’t much room for proper reflection – @Schopflin repeated the mantra, “Make every word count”, and pointed out that “your statement should be supported by evidence, not contain it”. @Misteemog was advised to think of the statement as an executive summary of your application.

@tinamreynolds posed a question:

@tinamreynolds on reflective writing in evidence

@johnmcmahon31 writes a reflective report on each event he attends. The portfolio I currently have on loan from CILIP (by Simon Ward) makes very good use of this – every training day and course is written up in a report, including aims and achievements, so the author is reflecting on lessons he’s learned and applied in the workplace as well as just describing the experience. The statement can then be limited to one or two lines for each objective.

The CV is another place to add reflection:

@katy_bird on reflective writing in CV

By describing your key achievements in each job role and how you applied what you learned on training courses to the workplace, you can save space in your statement. Four pages is a lot to play with; even if you’re as prolific as @tinamreynolds and have reems of training to draw on, you can add a line or two of reflection on the most useful courses if you’re selective.

@Readyourbook suggested adding some reflection to an explanatory note at the top of each piece of evidence, to make it absolutely clear to the assessors why the evidence is included.

Other issues covered

  • @ellyob wondered how many objectives you should identify in the PPDP – most people had four or five development areas but could bundle different objectives together under each one
  • How much evidence do you put in the portfolio?
  • Ways of organising evidence – whiteboards, physical piles of evidence, post-it notes in a matrix
  • The benefits of including an explanatory statement at the top of each piece of evidence, making it clear why it is included
  • Reassurance that any changes to the chartership process following the Future Skills consultation won’t apply to anyone who is already registered for chartership
  • What to include in the CV

Top tips on writing the evaluative statement

  • It’s never too early to think about the evaluative statement – if you have it in mind while you’re gathering evidence it’ll be much easier to write
  • Think of your statement as an executive summary
  • Assessors prefer it if you base your statement on the criteria (though it’s not the only way) – the clearer you make it the easier their job is
  • It’s all about reflection not description – the place to describe activities is in the evidence…
  • …and if you go over the word limit try to fit more reflective writing into your evidence
  • Ensure everything in the portfolio is referred to in the statement – don’t let evidence sit alone
  • Arrange the statement with headings, bullet points etc to break up the text – it’s easier for the assessors to read and navigate (although long headings might eat up the word count!)

#Libday8 Day three (1 February)

Twitter on a Samsung Galaxy smartphone

Checking in with Twitter first thing

Library Day in the Life Round 8

Senior researcher, Guardian News & Media (news library)

I’ve realised some of the queries I’ve mentioned this week are vague in the extreme – I’m a bit nervous about putting too many details (names, for example) before the articles are published, not that I’ve been researching anything controversial but some journalists like to keep quiet about what they’re writing! I might come back and add links once the pieces are in print.

  • 10am: First job of the day is to listen to the morning conference – an editorial meeting open to anyone on the paper, which they stream so you can watch on your desktop. They invite guest speakers in most weeks, and today it’s Lord Hunt, chairman of the PCC, who was at Leveson yesterday.
  • 10am: A bit of multitasking! We’ve been asked for a detailed package on Google for one of the editors, by lunchtime, so we’ve all taken sections (it’s quicker to share a big job if there’s a tight deadline). Used Factiva to search US/global press as well as UK, then emailed the docs.
  • 11am: Took an email query from a journalist I did some work for last week (some writers have their favourites among us!), trying to pin down the precise role of a government advisor (there’s some confusion over whether he still is one). I hit Google for some details then the text archive, for a reliable source.
  • 12pm: Another email query, looking for profiles and background on an interview subject. Back to the text archive and Factiva!
  • 12.15pm: A face-to-face query (from the feature writer who sits behind me!), on Michael Gove and his use of the insult “Trot!” yesterday. He needs it for tomorrow’s paper so this takes priority over the previous query…
  • 12.45pm: …which I’ll get back to now! It’s turning into another day spent on Factiva. I don’t want to engulf someone with a deluge of articles, but when I’m researching for someone who is planning an interview I try to provide enough substantial previous profiles to give them more than one viewpoint and lots of ideas. This interview subject is a writer so I’m sending reviews of his books as well, several articles he’s written and any recent news so the interviewer is up-to-date.
  • 2pm: Lunch!
  • 3pm: We’ve been asked to trace a quote that is being questioned by a reader, so back to Factiva and Google.
  • 3.15pm: A bit more work on an archive blogpost for the anniversary of the Queen’s succession to the throne in 1952, writing the bones of the article and prepping some of the images in Photoshop. I’ll have to finish it off quickly on Monday morning!
  • 3.30pm: Picked up a journalist query from email, looking for statistics on mental health issues for girls and women (eating disorders, depression, self harm). I looked at a few charity websites, which led me to the Department of Health’s 2010-11 hospital episode statistics – the data will never be comprehensive because cases only get recognised when people enter the healthcare system, but it gives an idea of the gender and age ranges where it is most prevalent. They focused on eating disorders last year as one of their topics of interest, which is helpful. I found reports of recent surveys on body image and self harm on the text archive too, and comparative depression stats from the ONS Adult Psychiatric Morbidity study 2007.
  • 5.15pm: It’s been a busy day today! Just enough time to read through the draft of a From the archive blogpost a colleague has written (we try to check each other’s work because the blog doesn’t go through a subbing process before we publish).

Normally that would be me done for the week – I’ve been working three days a week since I had my son four years ago – but this Friday I’m going to news:rewired so I’ll have a day off and post a few thoughts on what I learn there over the weekend.

Read today: Hashtags for information professionals by Bethan Ruddock; Librarians as Agents of Democracy by Lauren Smith on the Walk You Home tumblr; the National Libraries Day event page; Laura’s Guide to chartership on the Dark Archive blog; the Guardian’s report on the Whoopensocker US dialects dictionary; and #libday8 posts from Nicole Brock, Lauren Smith, JoLibrariAnne and Rachel Bickley.

#Libday8 Day two (31 January)

Library Day in the Life round 8

Senior researcher, Guardian News & Media (news library)

  • 8.45am: Caught up on Twitter on the bus in to work, and read a couple of really interesting #libday8 posts and articles (see below for links)
  • 9.30am: I had a journalist query waiting in my inbox when I logged on –  looking for a quote from Hansard (the House of Commons record). There’s an easy-to-use archive on the parliament.uk website which goes back to 1988, and there was a possible date (we like!) but it wasn’t the right one. I searched by MP on Hansard but couldn’t find the quote for that year’s session, so I checked our text archive and Google for the article referenced in the email, to see if I could gather any more info. This led me to a new date (1994), but you can’t search that far back by MP so I headed to the advanced search instead. I should have tried there first! Nothing came up when I searched for the keywords. I even whacked the quote into Google but no joy there either. I’ve asked the journalist for more information. And it seemed so simple…
  • 10.30am: Our trainee maintains a spreadsheet of casualty figures from Afghanistan which feeds into a Datablog post on British dead and wounded. I’m responsible for updating the running total in the article and relaunching it with an amended table when wounded figures are released. The spreadsheet also includes amputation figures, released quarterly by the MoD through DASA. They’re out today, and normally our trainee Nina will just copy over the new figure and tot up the annual total, but the MoD have changed their methodology so now they’re reporting amputations by financial year. This, and the fact that they don’t report quarterly figures less than five, means our annual totals no longer tally with the official MoD numbers. I’m loathe to switch to financial year (“there were xx amputations in 2010″ is nicer journalistically than “there were xx amputations in 2010/11″) but as it stands our annual data is incomplete. I’ve added an explanatory note to the spreadsheet but I’m going to consult the Datablog editor to decide whether we should switch to the financial year rather than Jan-Dec.
  • 11am: Still working on the Hansard query! The journalist has provided the source of the quote (a pressure group report), so I’ve got more to go on – it seems the quote may not have been said in the House. I checked Google in the first instance, and the writer thought it may have been from the Diana inquest but I can’t find it. I’ve admitted defeat and contacted the House of Commons information office.
  • 12pm: We’ve decided to include both the Jan-Dec and financial year totals for amputations in the Datablog figures, so Nina and I amended the spreadsheet and added a note to explain discrepancies.
  • 12.15pm: Amended the table of wounded data attached to the Datablog post on British casualties and relaunched the article.
  • 1pm: My job share, Lauren, worked on a crowdsourcing Datablog post about people who have refused honours a few weeks ago, and a reader has emailed with some new information, so I added it to the spreadsheet.
  • 1.30pm: Proper lunch break – there’s a charity cake sale on by the canteen today so I treated myself to a slice of cherry loaf, yum.
  • 2.30pm: Panic over regarding the missing quote, the journalist has located it himself (a bit embarrassing!).
  • 2.30pm: Working on From the archive, the ‘on this day’ series that we publish online (with a piece printed in Saturday’s Guardian comment pages). Found two too-long pieces for the end of Feb (Mickey Mouse in 1935 and trials of using computers in 1986) which need trimming.
  • 4pm: Journalist query – tracking down a Guardian article from the 1990s – pulled together a few possibles from the text archive and Factiva.
  • 4.30pm: Graphics wanted some help finding military stats for an interactive, so I borrowed the Military Balance and photocopied the relevant pages. A bit tedious but much preferable to typing it all out!
  • 5.25pm: A quick one to finish – a PDF of an Observer spread from a fortnight ago (from the text archive).
  • 5.30pm: Leaving on time for the first time in a long time. Sounds like a country song.

Read today: The engine of serendipity, a 2006 post from Nicholas Carr’s Rough Type blog (via @lilianedwards); Jonathan Franzen : e-books are damaging society in the Telegraph; Save Our Libraries campaign one year on from the Guardian Books blog (via @SimonXIX); Library Day in the Life day one by Nicole Brock at Odd Librarian Out; Library Day in the Life part 2 by Tina Reynolds.

#Libday8: Day one (30 January)

Mac computer screen in the Guardian library

The view from my desk

Library Day in the Life Round 8

Senior researcher, Guardian News & Media (news library)

  • 9am: Check Twitter on the bus on the way in to work, to find out what’s in the news this morning (I need to be aware of what’s going on, and it gives me an idea of what I might be asked to work on later). Work doesn’t begin when you enter the office any more!
  • 9.45am: Deleted last week’s emails – I don’t work on Thursdays or Fridays so my inbox fills up with old queries.
  • 10am: Picked up an emailed query for a land registry search, left over the weekend, and emailed the team to say I’m doing it (that might seem redundant – there are only six of us and we all sit together – but we work different shifts and my job share won’t pick up the email until Thursday, so it pays to attach a name to every job).
  • 10.15am: Ran the land registry search – I haven’t used it much so it was good to get a bit of practice – and emailed results to the journalist.
  • 10.20-11am: I’ve been running a Guardian Datablog post on the film awards season, something I initiated a few years ago, using a Google spreadsheet to track all the key nominees and winners leading to the Oscars in February. This weekend was the Directors Guild and Screen Actors Guild awards, so I added the winners to the spreadsheet and added a para about them to the article (the DGA winner usually wins the Oscar). The post has tables for best actor and actress attached, and I edited these so that the winners stood out in bold. I also created a new table, for best director nominees. Then I relaunched the post (with today’s date) and told the web team about it so they can attach the story to any relevant content.
  • 10.30am: Michel Hazanavicius doesn’t have a keyword on the website yet, so I emailed our keyword manager to request one. He’s on it!
  • 11am: Quick department meeting to discuss jobs coming up this week.
  • 12pm: It’s the 40th anniversary of Bloody Sunday so I wrote a quick blogpost for From the archive, with Guardian coverage from the time. Once I’d prepped it I ran it past a colleague and our SEO team, to sub it and improve my SEO!
  • 1pm: Checked the text for tomorrow’s From the archive piece against the original, to weed out any stray commas or spelling mistakes. This is one of the tasks that is rota’d each week.
  • 1.15pm: Quick lunch! At my desk, because I’m going to a staff briefing at half one. I’ll try and sneak some time away from the screen later.
  • 1.20pm: The SEO team are happy with my Bloody Sunday post so I launched it, then tweeted the link from our department account (@guardianlibrary).
  • 1.30pm: Company-wide meeting about an upcoming Guardian event.
  • 3pm: Took a query from a journalist about MPs prosecuted following the expenses scandal (we get most queries via phone or email but this one sidled up to my desk, which I like). I checked for each of the MPs on our text archive of national papers, and on Factiva (to catch regional coverage), and sent articles via email.
  • 4.15pm: Journalist query for background on Seb Coe, via phone – Factiva again.
  • 5pm: I checked over the From the archive pieces for Wednesday and Thursday while I had a bit of spare time.

Kept up with a few articles via Twitter today – Beyond books: what it takes to be a 21st century librarian and Online newspaper metrics? The grey lady doth protest too much, methinks. I keep Twitter running in the background  and check in periodically to keep up to date, don’t want to miss anything!

Reading: The role of a 21st century librarian

Great post from Emma Cragg and Katie Birkwood on what it takes to be a 21st century librarian, on the Guardian careers site (published a year ago), that I stumbled across today.

In all library roles customer service and communication skills are important. If anyone ever thought they’d become a librarian because they liked books or reading, they would be sorely disappointed if they did not also like people too.

So true! So much of the role is communicating the information you find to others.

Working week, 23-25 January 2012

  • Generating a Wordle from a hashtag: Education asked on Tuesday if we could create a word cloud  from the questions asked on Twitter using the #askgove hashtag. See my post for more info on how I prepped it (and why it didn’t run).
  • Awards nominations: The Oscar nominations were announced on Tuesday, streamed live on the web. I prepped the article first thing, added the nominations to our spreadsheet as they came in then amended the article before publishing (controversially no nomination for Tilda Swinton!). My computer decided to remove itself from the server ten minutes before the announcement, cue much gnashing of teeth, but luckily the old ‘turn it off then turn it on again’ trick worked. I had a bit of a struggle creating summary tables for the page (thanks Critics’ Choice for nominating six best actors) but Ami showed me a nifty way of narrowing the columns. And I forgot to change the date (so that the article jumps to the top of the Datablog list), but handily someone else noticed and fixed it! Quite exciting launching a story in real time, but lessons to be learned about paying attention to the little details.
  • Pre-emptive blogging: I’m trying to work on a few blogposts in advance, so I’ve been looking for content on the Queen’s succession and Valentine’s day.
  • Journalist queries included recent comment on the health and social care bill, a 1994 article from Modern Law Review (luckily one of the free online ones), writing bulletpoints on some Olympic sports, corrections, polls on satisfaction with the NHS, a comparative health report, profiles of Aki Kaurismaki, a fact check on Woody Harrelson and background on the judiciary system (who heads it, who regulates it, law schools).

Generating a word cloud (or not) from a Twitter hashtag

Word cloud showing most common questions under #askgove

Sample #askgove word cloud created from around 2,500 tweets

Education asked last Tuesday if we could create a word cloud on Friday from the questions asked on Twitter using the #askgove hashtag. One of those jobs that seems simple on the surface but isn’t!

  • Problem one – by Tuesday there were already thousands of tweets, and Twitter will only allow you to search so far back on a keyword.
  • Problem two – they wanted the cloud generated on Friday (when they go to print) so they could include as many #askgove questions as possible, which meant checking for new tweets every couple of hours during the week to compile an immense list.
  • Problem three – because there were so many tweets, it was impossible to go through and weed out all the extraneous words like reply, retweet, favorite, open, askgove before generating the cloud, to say nothing of all the stop words (and, a, the…). They wanted a cloud that highlighted the key questions being asked, so no words relating to usernames, no why/will/what/when… and sadly no swearing!
  • Problem four – I don’t work on Fridays.

I got as far as I could with it – I searched for #askgove on Twitter and pasted the available list of tweets so far into a program called word counter, to generate a list of words ranked by frequency. That weeded out some of the basic stop words. But how to turn that into a Wordle? I could see the most popular terms, but they only occur once in the text generated by the counter so the word cloud would be meaningless.

Step forward production, specifically a systems editor, who showed me a nifty bit of code which takes the word counter list and returns each word, repeated as many times as the frequency number next to it. Weed out the words we don’t want (check the ones we’re not sure about – ebacc, ict, hei – on Twitter), paste this into Wordle and voila! a word cloud.

I showed the process to the art director who works on Education, and mocked up a word cloud using the layout and colours she chose, to see whether it worked on the page.

I wrote detailed instructions for colleagues, and at their request I talked them through the process at my screen, so they could create the cloud without too many difficulties. They started to add to the list of tweets at the end of Wednesday (while I was still in, to check they’d got the process right).

And then…

…the word cloud was dropped from the supplement. This happens fairly often in journalism – a story is superceded by breaking news, the space is needed for advertising or a better alternative presents itself. The reason in this case was space – the word cloud simply didn’t work in the space available on the page. And they let us know early on Thursday, so my colleagues didn’t spend too long on it (sometimes we don’t get told at all).

So was it a waste of time? No. I learnt some valuable lessons, about how to generate word clouds but also about working with different departments (and colleagues) to create something for the paper.

Reflections

  • If something seems impossible at first glance don’t just dismiss it, there’s usually a solution and sometimes you have to put a bit of work in.
  • Ask for help if you don’t know how to do something – in such a big organisation there will usually be someone in the building who has the knowhow.
  • Collaboration is key – education came to us at the beginning with a clear idea of what they wanted but little knowledge of how it could be done; I took it as far as possible then consulted someone with the technical knowledge; and collaborated on the design so the editors could make a final decision. Sharing knowledge led to a better end result, even though it wasn’t used.
  • Now I know how to create a word cloud from any volume of text, so if it comes up again it’ll be easy (she says…).
  • Walking colleagues through a complicated process is better than just emailing a list of instructions, which can be confusing (some people learn better with visual aids) and can seem a little superior (not everyone responds well to being told what to do remotely).

I think that last one is the lesson I should really take to heart!