A Week of Check-ins on the Path to One Billion (behind the scenes)

One of the great things about working at foursquare is having access to a huge dataset that — when viewed in aggregate — can reveal some really amazing patterns in how people use the product across the globe. To celebrate our one-billionth check-in, we wanted to create something that showed the scope and breadth of that data. In past dataviz projects, we’ve looked at the total history of foursquare and analyzed specific aspects of the check-in data. For this project, we decided to limit it to one typical week of foursquare usage and keep the visualization simple, to give people a chance to draw their own observations. What we decided on was a straightforward map, with time-lapsed checkins animating across it. We chose to color-code the venue categories to reveal a bit about the specific activity going on in different places throughout the day. Here are some of the tools used to create this visualization:

  • The check-in data was provided by our data team, pulled from the Hadoop/Hive infrastructure they’ve built for analysis purposes. You can learn more about that setup here. From Hive we get a CSV of all the check-in lat/longs and whatever other fields we queried. Even for just one week of check-ins, this is a 2.9GB worth of data.
  • The visualization itself was mostly created with Processing. I’m a huge fan of Processing because of the flexibility it allows. There are definitely other great dataviz tools out there, but with a Processing sketch it’s easy to slap together a basic version of your idea and then iterate on it, with basically no limits on what you can change or customize. I’d point anyone looking to experiment with dataviz at the great tutorials at processing.org/learning. The basic paradigm of animation in Processing is that you queue up all the data needed to render one frame of your animation, draw it, and then move to the next frame. For each frame of this sketch, it reads a bunch of lines from the CSV, stores the check-ins in memory (so a given check-in can persist for multiple frames and change positions if the map pans or zooms), calculates the category percentages, and then draws the whole thing. This approach isn’t great for real-time animation, but for something pre-rendered like this video it works fine.
  • I used a Processing port of the Modest Maps library to draw the map tiles and handle all of the projection math. On the first couple check-in map projects I did I tried to do this myself, but quickly ran into my limitations as a mathematician. With Modest Maps, all I have to do is call the InteractiveMap.locationPoint() and it gives me x/y coordinates for where to draw the dot.
  • I created the map tiles themselves using a great open source app called TileMill. TileMill basically lets you take shapefiles (I used a stock world map that comes with the app) and apply CSS-esque stylers to them to create totally customized map tiles. This was my first time working with it, and it was astonishingly easy. The TileMill app runs a local map tile server, so all I had to do was point Modest Maps to the local URL, and I had a custom interactive map running in Processing (take a look at modestmaps.providers classes for examples on how to do this). If I wasn’t trying to animate the data, I could actually have done all of the work right in TileMill by just converting the data to GeoJSON, importing it, and applying a few stylers.
  • Processing spits out a sequence of images (using saveFrame()), which I assembled into a movie using Quicktime Player 7 (they got rid of this feature in the latest Quicktime Player, grrrr. Fortunately you can still install QT7 from your OS install CD). I added a few last bits in with Adobe After Effects: the zoom-in bubbles on a few cities across the globe (I rendered the check-ins in Processing but overlaid them in AE), a subtle glow on the category bars, and the title and end cards.

My favorite thing about dataviz is starting with the simplest visualization possible and then gradually iterating on it to further explore the data. I learned a lot on this project, and also built up my toolkit quite a bit. I’m looking forward to putting these new tools to use and adding even more on the next project.

If you want any further info on any of the above tools or have any techniques of your own to share, definitely hit me up in the comments.