Archive

Visualisation

Inspired by the chaps over at Google, Twitter unveiled their first Transparency Report in July last year, which aimed to shed more light on:

  • government requests for user information
  • government requests to withhold content, and
  • DMCA (Digital Millennium Copyright Act) takedown notices received from copyright holders.

The report also described how and when Twitter responded to these requests, but the take-home message was the rapidly increasing number of government appeals for information; “more in the first half of 2012 than in the entirety of 2011.”

6 months later in January this year, Twitter rolled out transparency.twitter.com – a new home for their transparency report and part of a drive to make the data more accessible.

Within a couple minutes of this tweet…

Screen Shot 2013 05 04 at 16 07 45

… I had gone onto the site and quickly copied the data for both removal requests and copyright notices into a Google spreadsheet. The data was split into 2 sets (Q1 – Q2 and Q3 – Q4) and so required a brief bit of formatting before it was ready to visualise.

Screen Shot 2013 05 04 at 17 26 06

After spending 10 minutes or so wrangling the data into the right format (read my earlier post regarding Google Fusion tables and getting hold of shape files) I had the following data, ready to chart, plot, map, or do whatever was needed:

Screen Shot 2013 05 05 at 13 20 19

Screen Shot 2013 05 04 at 17 31 41

There are a number of ways to visualise the above datasets, but to ensure a quick turnaround of the data, I opted for the built in Google options of Google Charts and Google Fusion tables. Both tools are free.

If you’re familiar with creating charts and graphs in Excel, Google’s Chart Wizard should be fairly intuitive. Even if you’re a bit rusty, the wizard previews any changes you make to your chart before you finalise, so play around with the options until you’ve got your visualisation the way you want it.

Be wary of choosing an unnecessary and overcomplicated type of chart; with relatively simple datasets like these, if your visualisation isn’t helping you to distinguish between different variables or interpret the data in a more efficient way, it’s not doing its job properly.

The ease and speed with which someone can use Google’s free tools to quickly turn around data meant that in 15 minutes of Twitter’s original tweet, I had produced and tweeted this:

Screen Shot 2013 05 05 at 13 36 23

Screen Shot 2013 05 05 at 13 43 54

…and in less than 60 minutes, I had produced and tweeted this:

Screen Shot 2013 05 05 at 13 43 26

Screen Shot 2013 05 05 at 13 44 16

As I don’t have a self-hosted WordPress blog, you’ll have to click on the above image to link to the actual Fusion map.

This fast turnaround of data is the sort of style that the Guardian datablog utilises effectively every day, thereby keeping their posts newsworthy. While custom-coded visualisations are often very impressive, they take time and effort to create. For simple data that you’ve accessed soon after its release, this is the way to go.

Advertisements

While working on The Hackney Post during City’s production weeks, I was seconded away from the web team for a day or two in order to create some visualisations for the paper.

This requires a slightly different approach to data visualisation. On the web you don’t necessarily need to display the data points on a bar graph, for example, as the reader can hover their mouse over the relevant bar and see the specific data point in a pop-up window. The same can be done with Google Fusion maps and other types of embedded visualisations to display more information in a cleaner way.

The same approach obviously doesn’t work for print, which means you have to think a little more carefully about how to cram all of that vital data into a visualisation without ruining its readability and therefore the reader’s subsequent comprehension.

A knowledge of photoshop can come in handy here, as you can tweak and fine-tune to your heart’s content – much more so than with the built-in options available to you with most free web tools.

For a story on drug raids in Hackney that was to appear in the next day’s paper, I created the map below, using crime rates available from the Met Police, a screenshot of a Google Fusion map I created containing the colour-coded Hackney Ward boundaries by crime rate, and the locations of those recently arrested under a Hackney Police operation.

hackney-crime-map-for-paper

The font for each ward name was chosen as it matched the Hackney Post’s new logo, which I also created as part of the paper’s redesign:

Hackney-Post-logo700

While the key could be a little clearer, the image caption in the paper also clarified what the reader was looking at, and how to interpret the map.

A versatile data journalist is also one that can work with traditional print media. Amassing any sort of transferrable skills in data journalism will make you a worthwhile addition to any organisation.

In my last post I wrote about how I learned to use Google Fusion tables to create an interactive map.

I made another map this morning, featuring some data that fellow interhacktive, Sam Creighton, sent my way yesterday.

This one visualises requests that Google have received from governments the world over, asking for user data to be handed over. The data I mapped is for the period Jan 1st – Jun 30th 2012.

Some interesting figures from a cursory look at the data:

  • Google complied with 90% of requests from the US
  • Hungary, Russia and Turkey submitted a combined total of 262 requests from Jan – Jun, but Google didn’t comply with any of them
  • The US submitted more requests in the first 5 months of this year (7,969) than it did in the entirety of 2009 and 2010 combined (7,867).

Geopolitics aside, the last point is particularly interesting as it hints at how rapidly the digital world has grown, even over the last few years.

Google blogged about the rise in government requests earlier this week:

The information we disclose is only an isolated sliver showing how governments interact with the Internet, since for the most part we don’t know what requests are made of other technology or telecommunications companies. But we’re heartened that in the past year, more companies like DropboxLinkedInSonic.net and Twitter have begun to share their statistics too. Our hope is that over time, more data will bolster public debate about how we can best keep the Internet free and open.

I’ve heard a lot about Google Fusion Tables over the last few weeks. Simon Rogers, editor of the Guardian Datablog, must have mentioned them 10 times when he came to talk at City last month. Some of our other lecturers, including Paul Bradshaw and Gary Moskowitz have also mentioned them in class or in passing.

As a fledgling interactive journalist, they sound pretty important. Especially when I don’t really know what they are.

Being the industrious student that I am, I thought I would ignore all of the immediate work I should have been doing over the weekend, and instead have a play around.

I thought I would keep things simple and create an interactive map – the type of thing you see on the Guardian all the time.

I knew I would need some data so I decided to head over to whatdotheyknow.com to see if anyone had kindly submitted a single freedom of information request to all of London’s borough councils.

Luckily for me, a man called Hugh Roberts had recently used whatdotheyknow to FOI 165 councils across the country about dog shit.

Success!

Hugh had asked each of the 165 councils (including the London borough ones):

    • how many complaints have you received about dog fouling from 2005 to 2012?
    • how many fixed penalty notices for dog fouling did you issue during the same period?
    • what were the dates of any dog fouling campaigns the council participated in?

I decided to discard the third question for the moment and focus on the first two.

I created a Delicious bundle consisting of every whatdotheyknow response page for each of London’s 33 boroughs.

Unfortunately, Hugh probably didn’t attend Heather Brooke’s excellent talk on the FOI act last month, so he hadn’t known to ask the councils to provide the data in Excel format. This meant laboriously trawling through every response letter, deciphering the myriad of different ways that each council had chosen to display its information.

The irony that I was doing the digital equivalent of sifting through dog shit wasn’t lost on me.

I then totted up the figures and put them into a spreadsheet:

Not every council had successfully responded to Hugh, so I only had data for 26 of the 33 London boroughs. Once I had finished with the spreadsheet, I exported it as a CSV (comma separated values) file.

Next, I went to my Google Drive and clicked on Create –> More –> Fusion Table (experimental). I uploaded my CSV file, gave the table a name and a description, and clicked Finish.

The data was then imported into a Google table:

Then I went to research.google.com/tables and searched for ‘London boroughs’. I clicked on a result that looked like it would contain the borough outlines and copied the URL.

Back on my original table, I clicked on File –> Merge, and pasted the URL into the box that appeared. After clicking Next, I made sure that the ‘Borough’ categories matched up from both tables, clicked Next again, made sure every tick box was ticked, and finally clicked Merge.

Google then fuses (see where the fusion aspect comes into it? Clever stuff!) the two tables together. In the resulting window, there’s a tab called ‘Map of Outline’. Click it, and hey presto, your dog shit data should now be lovingly housed by beautiful, red borough boundaries.

Go to Tools –> Map styles to have a play with the colour and border settings.

If you can’t be bothered to make one yourself, have a look at my end result by clicking on the screenshot below (unfortunately there’s no way to embed the final product on WordPress blogs). I set up a gradient colour system so you can quickly glean a bit more information from it before interrogating the figures further.

As with all data journalism, the data is useless unless you use it to tell a story. My main aim was to learn a bit about the Google Fusion software and how to transform data from a spreadsheet into something more interesting, but I’ll follow this post up with some ideas on where I might head next if I was writing a story.