Behind the scenes with MBTA data.

In this article we address the problem of knowing, with precision, the final destination of riders.

As MBTA riders know, the T only requires bus and rapid transit passengers to interact with a fare reader when boarding a vehicle or entering a subway station (with the exception of the route 71 and 73 buses outbound, where customers pay on exit rather than entry). This means that our automated fare collection system collects data about where and at what times passengers pay or validate their cards, but doesn’t collect data about where passengers exit the system. In addition, while we know at which specific station each faregate is located, the fareboxes on buses and Green Line trains aren’t linked to the vehicle’s GPS, so they don’t record the stop a customer boards at, only the vehicle they get on and the time of the tap. 

Having quality data on the origin and destination of all trips would vastly improve the ability of the MBTA to plan service and understand where there the network is crowded. In Washington, DC’s system, for example, passengers tap their SmarTrip card on exit, so WMATA’s planners and analysts know not only passengers’ origins but also their destinations, and therefore know with a high degree of precision how many people are on each subway train at each time.

To address this problem, Raphael DumasJay Gordon and Gabriel Sánchez-Martínez in MIT’s Transit Lab developed the ODX model, which stands for Origin, Destination, Transfer (or “Xfer”), as part of MIT’s ongoing research partnership with the MBTA. In short, this model looks at the series of transactions for a CharlieCard or Ticket over the course of the day and infers each trip’s origin, destination, and any transfers. (For simplicity, we will refer to all fare media as cards in this post.) To safeguard the privacy of passengers, all modeling is anonymous, and access to the data is secure.


Above: Using ODX, Jay Gordon produced this visualization of passenger journeys throughout one day in the T system. For more, see Jay's website here.


How it works – the details

Inferring the origin is straightforward for trips that start at stations with fare gates: the model simply records the station and time the card was tapped. For bus and trolley trips, ODX also uses vehicle GPS data and matches the time of the tap with the vehicle’s position at that time to infer what stop passengers boarded at. 

To infer the destination of each trip, the model assumes that the passenger is going somewhere close to the origin of the next trip recorded on the card.  An optimization algorithm finds the best itinerary on the T network from the first origin to the next origin at that particular day and time, accounting for typical customer preferences about waiting, in-vehicle time, transfers, and walking.  The model extracts the destination place and time from this optimal itinerary.  For the last trip of the day, the model assumes that the passenger is returning close to the first origin of the day.   

The model infers transfers in two ways, since the MBTA has both behind-the-gate transfers where you don’t have to validate your card again (for example, downtown stations like Government Center and State) and transfers where a second validation is required (switching from a bus to the subway or vice versa). ODX infers transfers made behind the gate from the optimal itinerary used for destination inference as described above.  Transfers between trips involving separate taps are inferred based on a set of spatial, temporal, and logical checks.  ODX infers a transfer when the time and distance between the destination of a trip and the origin of the following trip is short, and when the following trip does not return close to the first trip’s origin.  


ODX cannot infer destinations for all types of trips. For example, when a passenger validates her card only once in the day, no data exists with which to infer her destination.  To account for these and other missing trips, ODX scales up the number of inferred passengers for each origin-destination pair to match the total number of passengers recorded at that origin. For example, if the model infers destinations for 85% of trips from State, the 15% that aren’t inferred are distributed proportionally across the inferred destinations. 

Another limitation to the model is the assumption that the end of one trip is close to the beginning of the next. We know that this is not always true, as people may use another form of transport between trips on the MBTA such as walking, taxi or bike share. Because this approach may yield incorrect destinations in such cases, several additional checks and are applied in order to exclude any trips where riders appear to be traveling away from their next origin or where the transition between trips appeared to be too far to walk. 

Currently the model is inferring 97% of trip origins, 75% of destinations, and 92% of transfers.  We are currently validating the model using data from a group of MIT students and MBTA employees who have agreed to share both their CharlieCard data and data from the Moves app that shows all of their travel. This allows us to check how accurate the model’s predictions are. Graduate students at MIT are reviewing the data and flagging problems or improbable results for review.  

We currently run this model for every day and aggregate the data by 30-minute periods to get the total passenger demand between stops over the course of the day. We then average this over several days (like all weekdays in a month or a season) to get an average level of passenger demand.

Currently the ODX model has several applications at the MBTA: it provides the passenger arrival rates that are critical to calculating subway reliability and it allows us to estimate the volume of passengers between stations by time periods on the subway and light rail.  Several MIT students are using the data to explore how to improve service planning, to develop load profiles for bus routes, and to develop a model of crowding on the Red Line. As the model is continually refined and eventually finalized, more applications will emerge. Stay tuned to the blog for details on a few examples.