Scraping for beginners

courtesy of

It’s Sunday night. I have had four days to ruminate on what I learned in my first data journalism session with Paul Bradshaw, who runs the Online Journalism Blog and can be followed @paulbradshaw.

By lunch I was ready to throw my computer out of the window and quickly follow suit, but in the afternoon something wonderful happened – I made a scraper!

“What’s a scraper?” I hear all one of you shout? A scraper is a small computer program that pulls bits of information from a website and puts them into a nice, neat table for you.

Using the Olympic Games website as an example, if you wanted to collect data (name, age, country, sport and result, for example) on each and every athlete that competed in this year’s games, it would take a long, long time. Longer than a Leonard Cohen song, even (© Armando Iannucci).

Almost 11,000 athletes from 203 countries competed and each of them has a unique page on the site. That’s a lot of browsing.

A scraper cuts out the slow middle man – you – and quickly pulls the required information into a Google spreadsheet or similar program.

In my next post I’ll cover how we created a simple scraper that collated data from Lovely.


