Generating a word cloud (or not) from a Twitter hashtag

Word cloud showing most common questions under #askgove

Sample #askgove word cloud created from around 2,500 tweets

Education asked last Tuesday if we could create a word cloud on Friday from the questions asked on Twitter using the #askgove hashtag. One of those jobs that seems simple on the surface but isn’t!

  • Problem one – by Tuesday there were already thousands of tweets, and Twitter will only allow you to search so far back on a keyword.
  • Problem two – they wanted the cloud generated on Friday (when they go to print) so they could include as many #askgove questions as possible, which meant checking for new tweets every couple of hours during the week to compile an immense list.
  • Problem three – because there were so many tweets, it was impossible to go through and weed out all the extraneous words like reply, retweet, favorite, open, askgove before generating the cloud, to say nothing of all the stop words (and, a, the…). They wanted a cloud that highlighted the key questions being asked, so no words relating to usernames, no why/will/what/when… and sadly no swearing!
  • Problem four – I don’t work on Fridays.

I got as far as I could with it – I searched for #askgove on Twitter and pasted the available list of tweets so far into a program called word counter, to generate a list of words ranked by frequency. That weeded out some of the basic stop words. But how to turn that into a Wordle? I could see the most popular terms, but they only occur once in the text generated by the counter so the word cloud would be meaningless.

Step forward production, specifically a systems editor, who showed me a nifty bit of code which takes the word counter list and returns each word, repeated as many times as the frequency number next to it. Weed out the words we don’t want (check the ones we’re not sure about – ebacc, ict, hei – on Twitter), paste this into Wordle and voila! a word cloud.

I showed the process to the art director who works on Education, and mocked up a word cloud using the layout and colours she chose, to see whether it worked on the page.

I wrote detailed instructions for colleagues, and at their request I talked them through the process at my screen, so they could create the cloud without too many difficulties. They started to add to the list of tweets at the end of Wednesday (while I was still in, to check they’d got the process right).

And then…

…the word cloud was dropped from the supplement. This happens fairly often in journalism – a story is superceded by breaking news, the space is needed for advertising or a better alternative presents itself. The reason in this case was space – the word cloud simply didn’t work in the space available on the page. And they let us know early on Thursday, so my colleagues didn’t spend too long on it (sometimes we don’t get told at all).

So was it a waste of time? No. I learnt some valuable lessons, about how to generate word clouds but also about working with different departments (and colleagues) to create something for the paper.

Reflections

  • If something seems impossible at first glance don’t just dismiss it, there’s usually a solution and sometimes you have to put a bit of work in.
  • Ask for help if you don’t know how to do something – in such a big organisation there will usually be someone in the building who has the knowhow.
  • Collaboration is key – education came to us at the beginning with a clear idea of what they wanted but little knowledge of how it could be done; I took it as far as possible then consulted someone with the technical knowledge; and collaborated on the design so the editors could make a final decision. Sharing knowledge led to a better end result, even though it wasn’t used.
  • Now I know how to create a word cloud from any volume of text, so if it comes up again it’ll be easy (she says…).
  • Walking colleagues through a complicated process is better than just emailing a list of instructions, which can be confusing (some people learn better with visual aids) and can seem a little superior (not everyone responds well to being told what to do remotely).

I think that last one is the lesson I should really take to heart!

Training: Tweetdeck, 22 November 2011

I’ve been using Twitter for a few years but I’m behind the times with my software! Following a talk about the use of Twitter, I signed up for a Tweetdeck intro session with John Stuttle, one of the Guardian’s systems editors.

Tweetdeck offers much more usability than the basic Twitter feed. Key for me is that it allows you to track more than one account at once (and from more than one social network), to tweet from more than one account simultaneously (say my personal and department ones) and to publish timed tweets (so we could set a tweet to launch the weekend’s From the archive in advance, for example).

John recommended signing up for a bit.ly account as well, which allows you to analyse the statistics on how many people used your link to click through to a story, versus other links, and use that to improve your tweeting. Bit.ly provides various other stats too – to view statistics and graphs just click the Analyze link at the top once you’re signed in, or click on Info Page next to an individual link.

 

CPD23 Thing 18: Jing, screen capture and podcasts

Maria’s Thing 18 post

Jing

I love the idea of Jing, and other screen capture software – being able to record a How to… would be of huge benefit to bossy old me (and any of my colleagues who have had to refer to a seemingly endless list of bullet points I’ve written).

Directing colleagues and users to a short video would save me from running through the same processes time and again, and would be a really useful tool for marketing the department (back to advocacy!). It certainly has potential.

The unable-to-download monster has reared its head again, but yay! there’s a non-downloadable option too (thanks Maria).

Screencast-o-matic

I had a quick go of recording a How to… – how to upload the From the archive column to the website. I made a bit of a schoolboy error – the recording box wasn’t big enough so every menu selection happened off screen. I didn’t record sound either, although I don’t think I’d use sound anyway (far too camera shy!).

The recording process is fairly intuitive though, and I’d definitely consider using it for all our how to… type material, once I’ve practiced a bit more.

Podcasts

I’m not ready to record my own (and I’m not sure our users would be ready to listen to me either!) but I’m going to have a listen to the arcadia@cambridge podcasts, and ask around (okay, ask Twitter) if there are other good library podcasts out there. Anyone know any?

CPD23 Thing 17: The medium is the message (Prezi & Slideshare)

Ange’s Thing 17 post

I should say right off that my experience of using slides is very new (as in, I created my first PowerPoint slide this afternoon for a presentation tomorrow – I’ll let you know how it goes in the comments!). Reading Ange’s suggestions, and browsing SlideShare, (hopefully!) helped me to avoid some of the pitfalls of using slides.

Being new to presenting, I’m not sure I’m ready to use advanced software like Prezi. I can see the advantages (and I’m sure the audience would find my talk much more interesting) but I find it hard enough just getting my points in the right order, without playing around with the screen as well.

I will be taking a fuller look at Prezi once I’ve delivered my talk though – as a tool for sharing a presentation online it looks like it’s streets ahead of my boring PowerPoint slides. If I manage to revamp those I’ll post a link.

CPD23 Thing 14: Zotero, Mendeley and citeulike

Isla’s Thing 14 post

As usual I’m hampered by my inability to download software at work (the problem of working for a big company that thinks it can meet all your IT needs, without asking you what they are, grumble grumble*). So I’m skipping Zotero and Mendeley, and focusing on citeulike.

It’s hard to try the site out thoroughly because there isn’t an obvious application at work, but it was quick to get started and seems easy to use. It could also come in handy as a new resource for finding articles, as well as recording ones I’ve found elsewhere.

Reflections

At the moment, I don’t need a citation tool, but I’m really impressed with the ones on offer so if I need one in future (Chartership portfolio?) I’ll definitely take a closer look. Like Isla, I really wish they’d been around when I was at uni!

*I’m being unfair, but it is frustrating when you don’t have control over the tools you use!

CPD23 Thing 13: Google Docs, Wikis and Dropbox

Thing 13 post

Google Docs

I’ve been using Google Docs at work for nearly three years and it’s really revolutionised the way I do my job. That sounds a bit dramatic but the role has changed so much since we adopted it!

Partly that’s down to the Guardian becoming a more digital, interactive product. Partly it’s because I’ve become involved in the Datablog, which is powered by Google spreadsheets of data. But it’s also made existing jobs easier.

Before, when the department was working on a project together, we would compile info in an Excel spreadsheet or Word file. It would sit in someone’s public folder, but we’d enable it for multiple users and as long as you were in the office you could amend it.

Quite often though, the spreadsheet wouldn’t like being updated by two people at once and would crash, or create multiples, and someone would have to go in and fix it.

When the work was complete, we’d have to email the spreadsheet to the editor or journalist who’d requested the work, or move it to their folder, so that they could work on it. If they decided they needed to change the format or needed a different dataset, they’d send it back and we’d start again.

Now, project spreadsheets sit in Google Docs. They’re shared with everyone in the department, who can actually really genuinely update them at the same time without causing problems. We can share them with any writers or editors who need them, too, and everyone can access them from anywhere, even (though we don’t want to encourage working after hours!) from outside the office.

If the end product is a graphic or interactive, it can feed directly off the spreadsheet in Google Docs, so any updating can be done in real time and seen immediately on the page (like this Afghan casualties interactive, or one we did for 9/11 a few weeks ago). The Datablog feeds off Google spreadsheets for most of its content (this Man Booker Prize 2011 one is my latest baby).

The future is spreadsheets. No, really.

Dropbox

Because I already use Google Docs, and because I can’t download software at work, I’m going to skip Dropbox for now (sorry Dropbox).

Wikis

When I attended CILIP’s Umbrella conference a few months ago the most practical nugget I came away with was to adopt a wiki at work, as a way of sharing knowledge between colleagues (thanks Alan Brine and the wiki the67things).

We’ve tried a few ways of sharing department ‘how to…’s, but never hit on a formula that everyone likes and, more importantly, that everyone uses and contributes to. At the moment we use Google Docs, which works fine as document storage, but is clogged up with all our other docs. I think a wiki is the answer, so this is a great opportunity to try it out.

Unfortunately, to use MediaWiki you have to download the software, which I can’t do, so I had a go with PB Works instead. I set up an account – researchandinformation – and had a wander round to figure out the navigation. It’s not as intuitive as some software these days, but it’s easy to create pages, and hopefully I’ll be able to add it to the arsenal of tools we use in the office.

Reflections

Google Docs is always going to win out over Dropbox for me, because it is already central to my working life and because you don’t need to download any software to use it.

I’m determined to set up a wiki for use at work but I’ll need to speak to the rest of the department before I go off all gung ho. There’s no point adopting a new resource only for others to ignore it (which has happened previously with Delicious). I’m pretty sure there’s software already available within the company which doesn’t require download (can you tell that annoys me?), but if not PBWorks will suit what we need.

I’ll continue to use existing wikis for career development – I take part in Library Day in the Life and I’m going to add myself to the Library Routes project. For now, this blog serves me well for recording my Chartership path, but I’m using a Google Docs spreadsheet to list CPD activities, and when I come to compiling my portfolio I’m sure I’ll use it more!

Like I said, spreadsheets are the future.