Tony Hirst (@psychemedia), Open University blogger and sometime Guardian Datastore collaborator
The Guardian Datastore released a spreadsheet of MPs’ expenses data, in a convenient format BUT it’s hard to make sense of spreadsheets, pictures and graphs tell you more and add value
- VISUAL REPRESENTATIONS of data are a valuable tool
- can interact, have conversations with the data, explore stories within it – like mapping travel claims expenses to show where MPs travelled from, colour coded for level of claim
Have to make the data fuid – wire it into other web-based resources to add value
- take data from Google spreadsheet (add &output=csv &range=B2:AH684 or whatever)
- Many Eyes Wikified to visualise data, do calculations
- might need to clean data (get rid of £, & etc) – Yahoo Pipes – eg can run a regex (regular expression) to replace & with AND – then output a cleaned CSV file
- can filter data to pick particular sections eg 1 MP, using theyworkforyou MPs’ info
Google spreadsheet — Yahoo Pipes — Many Eyes Wikified — embed in blog etc.
Location data isn’t always easy to add so use a join
- Google Fusion Tables / Dabble DB combines two spreadsheets with a common field (exact match) to match data eg MPs’ names and geo coordinates
- can use Yahoo Pipes to plot coordinates of postcodes on a map (y:location)
- can plot it on Google Earth
Can query Google spreadsheets if you import data
- =importhtml(“URL”, “table”,1) – will import a table from the web
- can then write queries on it (see Hirst’s recent blog for further details)
Useful sites for advice and how tos:
Francis Irving (@frabcus), mysociety.org – all about “poking the beast”
New sort of journalism making decisions about datasets, building stories from mysociety.org websites – open FoI, campaign leaflets lead to stories
- theyworkforyou.com includes voting record analysis
- whatdotheyknow.com for publicly submitted FoI requests eg allotment waiting lists – can search file type:xls for spreadsheets
- thestraightchoice.org – indexed party campaign leaflets – can browse by constituency, set up email alert, do tineye reverse image lookup eg BNP
- democracy club - signing up volunteers in every constituency to make election data available (leaflets etc)
Estonia is the goal – they publish all govt data, minutes have to be made available 20 minutes after the end of a meeting
Q&A
- Is there an international data source? UN is tidying up its data at the moment; wikileaks is good; UNdemocracy.com; International Telecomms Union; can do deep Google searching for .xls using site:…
- Can you strip data from PDFs? OCR software (Adobe?) could strip out basic .jpg data but would have to check for errors
Pingback: news:rewired conference, Jan 14 2010 « Librarian of tomorrow