On the web: DocumentCloud

The Christian Science Monitor librarian (@CSMLibrary) flagged up a blogpost about DocumentCloud from the NewsliBlog today.

DocumentCloud is a free online tool for converting PDF documents into web text that you can annotate, making the contents much more accessible and useable. News organisations in particular have used it to provide readers with official documents that add value to a news story.

As Derek Willis (@derekwillis) says in his blogpost:

It’s a great way to maintain a set of files that anyone from the newsroom can access and annotate, making it a good candidate for long-term project work. And when you’re reading to show that work to the world, you can make any or all of the files public.

I’ve not had time to play around with it, but I’ve struggled in the past with ways of stripping text from a PDF quickly and without too many errors, so anything that helps speed up the process can only be a good thing!

I can think of several time-consuming queries it would have helped with off the top of my head – editing From the Archive articles that predate our text archive (anything pre-1984), taking content for the Datablog from government documents that are only released in PDF or the Russian spies story, for starters.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s