HubCab: Visualizing 170 Million NYC Taxi Trips


data visualization



new york


Project Host is a project by the MIT Senseable City Lab


Benedikt Groß

Back-end/Data Mining

Michael Szell


Joey Lee

Project Coordination

Eric Baczuk

Web Development

47 Nord,,Pierrick Thébault, andBenedikt Groß

Map Data


Developed with

Maperative,,Shapely,MongoDB and andMapbox

HubCab is an interactive visualization that invites you to explore the ways in which over 170 million taxi trips connect the City of New York in a given year. This interface provides a unique insight into the inner workings of the city from the previously invisible perspective of the taxi system with a never before seen granularity. HubCab allows to investigate exactly how and when taxis pick up or drop off individuals and to identify zones of condensed pickup and dropoff activities. It allows you to navigate to the places where your taxi trips start and end and to discover how many other people in your area follow the same travel patterns. What do these visualizations tell us about collective mobility? How many of these cabs might you have been able to share with the people around you? And how might entertaining these questions be the first step in building a more efficient and cheaper taxi service?

With an ever-increasing trove of real-time urban data streams, we are able to see precisely where, how, and at what times different parts of our cities become stitched together as hubs of mobility. By using these pervasive, interconnected, and “smart” technologies, we can begin to unravel the complexity of our travel patterns and identify how we can reduce the social and environmental costs embedded in our transportation systems. In HubCab we target taxicab services as a way to understand the linkages between our travel habits and the places we travel to and from most often.

→ See the project at MIT Senseable City Lab: HubCab


Science of Sharing

The Science of Sharing The HubCab tool expands and changes the perception of urban space using a largescale data set. Studying this data, we show in a scientific study [1] the vast potential of taxi shareability. Our analysis introduces the novel concept of “shareability networks” that allows for efficient modeling and optimization of the trip-sharing opportunities. This mathematical approach makes use of network densification effects and represents a substantial advance over the existing state-of-the-art solutions to social sharing problems. Significant improvements of such a shared system are expected to lead to less congestion in road traffic, less running costs and split fares, and to a less polluted, cleaner environment [2].

The sharing benefits displayed on the map refer to total fare savings to passengers, distance savings in travelled miles, and emission savings in kg of CO2 that come from potentially shared trips. Our research [1] shows that taxi sharing could reduce the number of trips by 40% with only minimal inconvenience to the passengers. Here we assume this 40% shareability rate, together with the following highly simplifying assumptions: A fare of 3.00$ + 2.50$/mi [3], using Rate Code 1 not accounting for low motion fares or special surcharges, and average CO2 emissions of 423g/mi [4]. Traveled distance is simplified as linear distance.

Technical Development

The basis of the HubCab tool is a data set of all 170 million taxi trips of all 13,500 Medallion taxis in New York City in 2011. The data set contains GPS coordinates of all pickup and drop off points and corresponding times. Cartographic data of street shapes were obtained from OpenStreetMap. The streets were cut into over 200,000 street segments of 40m length each with a Python script and the help of the shapely Python library, and imported into a MongoDB. Pickup and drop off points were matched to the closest street segments. Street types unlikely to contain taxi drop offs or pickups, such as footpaths, trunks, service roads, etc. were not used in the matching process. Line widths of yellow and blue street segments on low zoom levels were styled on a logarithmic scale. The pickup and drop off points, represented as dots on the high zoom levels, were generated via an Arcpy script, being placed randomly within a box around a given street segment with the box width again following a logarithmic scale. GPX files of the dots were styled using Maperitive (), then merged and amended for different zoom levels. The dots and street line files were layered together with MapBox, which is the platform that streams all the map content.

The data back end of HubCab runs on a MongoDB (, containing all street segments and their coordinates, and all flows between each pair of street segments. The number of all possible street segment pairs is over 40 billion (200,000 times 200,000) per map. Radius selection is dynamic, using MongoDB’s near function to obtain flows from all segments within the radius of the pickup marker to all segments within the radius of the drop off marker. With nine maps (one for the yearly data, eight for 3-hour time segments on all Fridays/Saturdays) and three selectable radii, there is a total of over one trillion flow combinations that can be explored with HubCab. Communication between MongoDB and the front end is realized via PHP scripts and Javascript+JSONP.


  • P. Santi, G. Resta, M. Szell, S. Sobolevsky, S. Strogatz, C. Ratti. Taxi pooling in New York City: a network-based approach to social sharing problems (2013)
  • M. Szell, B. Groß. Hubcab – Taxi-Fahrgemeinschaften, digital erkundet. Die Stadtentschlüsseln, Bauwelt Fundamente, Birkhäuser. Eds: D. Offenhuber, C. Ratti (2013)
  • NYC Taxi & Limousine Commision. Taxi Rate of Fare
  • U.S. Environmental Protection Agency.
  • Greenhouse Gas Emissions from a Typical Passenger Vehicle.