Processing and Data Visualization with Jer Thorp

NY Times mentions of "olympics" and "election", 1981-2011

Remember that awesome Twitter visualization of airplane flights that floated around about two years ago called Just Landed? How about Good Morning, the visualization of people tweeting the words “good morning” over a 24-hour period? What they have in common is that they were created by artist Jer Thorp, who is currently Data Artist in Residence at New York Times and a visiting professor at NYU’s ITP program. I’ve been following Jer’s work for some time, so I was very excited to see recently that he was offering workshops in Processing (a data visualization tool that I use and have mentioned in previous posts). I’ve used Processing for a few dataviz projects at work, and I’d really like to do a lot more. Foursquare has an amazing dataset, and I feel like I’m wasting an opportunity if I’m not always trying out new ways of exploring that data through visualization.

Anyway, Jer Thorp is one of the most respected names in data visualization, so I jumped on the chance to take his workshop on Processing and Data Visualization. The workshop was held at Jer’s apartment in DUMBO (with an incredible view of the Brooklyn Bridge). There were about 7 of us, which was a good size group to allow for personal instruction. Jer makes a great teacher. His focus in the workshop (and as far as I can tell, in his work) is on exploration through visualization. In other words, rather than deciding on a particular end result you want, starting instead from the data and iterating gradually on your visualization until interesting insights begin to emerge. One obvious advantage to this approach is that it’s less likely to encourage bias or [wikipop]cherry-picking[/wikipop]). The other advantage is that it allows for a sort of “rough-draft” approach to the process of creating the final visualization. There can be a lot of complex factors, algorithms, and data-massaging involved in the process, so it’s ideal to take one small step at a time, rather than trying to do the whole thing in one go.

This working method also translates well to instruction. The workshop consisted of 4 tutorial projects in Processing, starting with a fairly simple visualization of time-series data to a more complex project involving data pulled from the New York Times API. The same principles applied in the lesson — gradually building on an idea, adding more variables, and furthering the exploration of the dataset. In our first project, we were given a generic series of data points which we graphed and then gradually modified with the goal of figuring out what the data actually was. Click on the image at left to see the graph. Bonus points if you can figure out what it is (we did come up with the answer, with what I think was only minimal nudging from Jer).

The second project involved working with data from the excellent We Feel Fine project. Check it out if you’re not familiar, but the basic idea is to harvest “feeling data” from blog entries across the internet and displaying it in creative ways. They also offer an API by which anyone can access this data and slice and dice it however they want. Even by implementing some very basic displays and controls, we were able to come up with some pretty interesting insights into the information we pulled. I particularly liked the approach we took to displaying this data (borrowing heavily from how it’s used on the We Feel Fine site). The atomic unit of data is a “feeling”, i.e. a sentiment-based word pulled from a blog. We represented these units with colored dots, which we just moved around in various arrangements to show different combinations and analyses of the data. I like this method so much, I did a project with it this weekend based on data from the foursquare API, which I will post here soon (still putting the final polish on).

The final projects were also based on live data from an API, this time from the New York Times. The NYT offers a very rich dataset — all of their articles from the past 30 years are meticulously tagged with a variety of metadata — and there are lots of interesting ways of slicing it. At the top of this post is a radial bar chart showing two time-series sets: mentions in NYT articles of the words “election” and “olympics”. This is just scratching the surface of the data available; you can also pivot on a particular tag or “facet” to explore related terms. It’s a very powerful tool for exploring the zeitgeist of the last three decades.

This workshop did for me exactly what tutorial learning should do: I developed a better understanding of various Processing methods and design patterns through the use of examples, and I was inspired to remix and expand on those examples in my own work. It’s pretty rare in these post-collegiate days that I get the chance to sit down with someone who’s an expert in a field I’m interested in and have them walk me through their approach and process so that I can understand it. If you’re at all interested in learning about Processing and data visualization, I highly recommend you give one of Jer’s workshops a try.