news:rewired #3: data mashing panel

Tony Hirst (@psychemedia), Open University blogger and sometime Guardian Datastore collaborator

The Guardian Datastore released a spreadsheet of MPs’ expenses data, in a convenient format BUT it’s hard to make sense of spreadsheets, pictures and graphs tell you more and add value

  • VISUAL REPRESENTATIONS of data are a valuable tool
  • can interact, have conversations with the data, explore stories within it – like mapping travel claims expenses to show where MPs travelled from, colour coded for level of claim

Have to make the data fuid – wire it into other web-based resources to add value

  • take data from Google spreadsheet (add &output=csv  &range=B2:AH684 or whatever)
  • Many Eyes Wikified to visualise data, do calculations
  • might need to clean data (get rid of £, & etc) – Yahoo Pipes – eg can run a regex (regular expression) to replace & with AND  – then output a cleaned CSV file
  • can filter data to pick particular sections eg 1 MP, using theyworkforyou MPs’ info

Google spreadsheet — Yahoo Pipes — Many Eyes Wikified — embed in blog etc.

Location data isn’t always easy to add so use a join

  • Google Fusion Tables / Dabble DB combines two spreadsheets with a common field (exact match) to match data eg MPs’ names and geo coordinates
  • can use Yahoo Pipes to plot coordinates of postcodes on a map (y:location)
  • can plot it on Google Earth

Can query Google spreadsheets if you import data

  • =importhtml(“URL”, “table”,1) – will import a table from the web
  • can then write queries on it (see Hirst’s recent blog for further details)

Useful sites for advice and how tos:

Francis Irving (@frabcus), – all about “poking the beast”

New sort of journalism making decisions about datasets, building stories from websites – open FoI, campaign leaflets lead to stories

  • includes voting record analysis
  • for publicly submitted FoI requests eg allotment waiting lists – can search file type:xls for spreadsheets
  • – indexed party campaign leaflets – can browse by constituency, set up email alert, do tineye reverse image lookup eg BNP
  • democracy club – signing up volunteers in every constituency to make election data available (leaflets etc)

Estonia is the goal – they publish all govt data, minutes have to be made available 20 minutes after the end of a meeting


  • Is there an international data source? UN is tidying up its data at the moment; wikileaks  is good;; International Telecomms Union; can do deep Google searching for .xls using site:…
  • Can you strip data from PDFs? OCR software (Adobe?) could strip out basic .jpg data but would have to check for errors

1 thought on “news:rewired #3: data mashing panel

  1. Pingback: news:rewired conference, Jan 14 2010 « Librarian of tomorrow

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s