In our first session with Paul Bradshaw, we scraped the website horsedeathwatch.com into a Google spreadsheet.
If you haven’t signed up already, then register with Google Drive. Once you’re done, log in and click on the large red “Create” button in the left sidebar. Select “Spreadsheet”:
You’ll then be taken to your blank spreadsheet. Cell A1 should automatically have been selected. If not, click on it:
Next, go to horsedeathwatch.com and copy the URL. We’re going to use Google Docs’ ability to scrape raw data from HTML tables into your newly created spreadsheet.
In cell A1, type:
Paste the URL of the website you’re scraping – in our case http://www.horsedeathwatch.com and close the quotes. Your cell should now contain:
Next, put a comma after the quotation marks and type “table” (including the quotes).
FInally, put another comma, and type “1”) (including the quotes and closed bracket). The final formula should look like this:
Press enter, and the data will be pulled into your spreadsheet, which should appear as below:
It’s worth then copying this data into a new sheet before saving the document, as at the moment your scrape is based on a live website. If the website changes, your data changes, so if you’ll be using this data for a particular story you’ll want to make sure it’s safe.
To do this, select all data (cmd+c on Macs, ctrl+c on Windows), click on Edit, then Copy. Then open a new sheet by clicking on the plus button in the bottom left corner. Click on cell A1 of the new sheet, click Edit, then Paste Special, then Paste Values Only.
Some other “import” functions include:
=importdata (grabs data from a .txt or .csv file)
=importfeed (grabs data from an RSS feed)
=importxml (grabs data from an XML file).