news:rewired #3: data mashing panel

Tony Hirst (@psychemedia), Open University blogger and sometime Guardian Datastore collaborator

The Guardian Datastore released a spreadsheet of MPs’ expenses data, in a convenient format BUT it’s hard to make sense of spreadsheets, pictures and graphs tell you more and add value

  • VISUAL REPRESENTATIONS of data are a valuable tool
  • can interact, have conversations with the data, explore stories within it – like mapping travel claims expenses to show where MPs travelled from, colour coded for level of claim

Have to make the data fuid – wire it into other web-based resources to add value

  • take data from Google spreadsheet (add &output=csv  &range=B2:AH684 or whatever)
  • Many Eyes Wikified to visualise data, do calculations
  • might need to clean data (get rid of £, & etc) – Yahoo Pipes – eg can run a regex (regular expression) to replace & with AND  – then output a cleaned CSV file
  • can filter data to pick particular sections eg 1 MP, using theyworkforyou MPs’ info

Google spreadsheet — Yahoo Pipes — Many Eyes Wikified — embed in blog etc.

Location data isn’t always easy to add so use a join

  • Google Fusion Tables / Dabble DB combines two spreadsheets with a common field (exact match) to match data eg MPs’ names and geo coordinates
  • can use Yahoo Pipes to plot coordinates of postcodes on a map (y:location)
  • can plot it on Google Earth

Can query Google spreadsheets if you import data

  • =importhtml(“URL”, “table”,1) – will import a table from the web
  • can then write queries on it (see Hirst’s recent blog for further details)

Useful sites for advice and how tos:

Francis Irving (@frabcus), mysociety.org – all about “poking the beast”

New sort of journalism making decisions about datasets, building stories from mysociety.org websites – open FoI, campaign leaflets lead to stories

  • theyworkforyou.com includes voting record analysis
  • whatdotheyknow.com for publicly submitted FoI requests eg allotment waiting lists – can search file type:xls for spreadsheets
  • thestraightchoice.org – indexed party campaign leaflets – can browse by constituency, set up email alert, do tineye reverse image lookup eg BNP
  • democracy club - signing up volunteers in every constituency to make election data available (leaflets etc)

Estonia is the goal – they publish all govt data, minutes have to be made available 20 minutes after the end of a meeting

Q&A

  • Is there an international data source? UN is tidying up its data at the moment; wikileaks  is good; UNdemocracy.com; International Telecomms Union; can do deep Google searching for .xls using site:…
  • Can you strip data from PDFs? OCR software (Adobe?) could strip out basic .jpg data but would have to check for errors
About these ads

One thought on “news:rewired #3: data mashing panel

  1. Pingback: news:rewired conference, Jan 14 2010 « Librarian of tomorrow

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s