Scraping for beginners

courtesy of wickes.co.uk

It’s Sunday night. I have had four days to ruminate on what I learned in my first data journalism session with Paul Bradshaw, who runs the Online Journalism Blog and can be followed @paulbradshaw.

By lunch I was ready to throw my computer out of the window and quickly follow suit, but in the afternoon something wonderful happened – I made a scraper!

“What’s a scraper?” I hear all one of you shout? A scraper is a small computer program that pulls bits of information from a website and puts them into a nice, neat table for you.

Using the Olympic Games website as an example, if you wanted to collect data (name, age, country, sport and result, for example) on each and every athlete that competed in this year’s games, it would take a long, long time. Longer than a Leonard Cohen song, even (© Armando Iannucci).

Almost 11,000 athletes from 203 countries competed and each of them has a unique page on the london2012.com site. That’s a lot of browsing.

A scraper cuts out the slow middle man – you – and quickly pulls the required information into a Google spreadsheet or similar program.

In my next post I’ll cover how we created a simple scraper that collated data from horsedeathwatch.com. Lovely.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: