Behind the scenes with MBTA data.

In our previous post about passenger walk distances, we used the Rider Census to examine how accessible transit is to its users and found that passengers walked further than the assumed half-mile to stations at the ends of the Red and Orange Lines, while they walked less than this to stations in the center of our region. Our main conclusion, which is perhaps obvious, was that the structure of the network itself has a large impact on how passengers interact with the network.

We wanted to use this data set to look at passengers’ entire journeys rather than just their access point. To do so, we developed a metric we call “substitution propensity.” In a transportation network, each station is only attractive for a set number of destinations. For example, Savin Hill is a station on the Red Line, so Savin Hill is useful for trips north to downtown Boston. However, for trips west to Ruggles or Dudley Square, Savin Hill is not as useful; it’s likely that people would walk to the nearby stop for the 15 bus instead. In other cases, two nearby stations might serve very similar journeys: for example, much of the E branch of the Green Line and the Orange Line run nearly parallel to each other.

Substitution, as it relates to walkability, is defined here as the propensity at which passengers exclusively choose a particular route over other nearby alternative routes. Substitution explains differences in how passengers choose to access MBTA services: passengers will walk for longer distances in areas in which there are fewer service options. This is also a useful metric for determining what qualities passengers value in MBTA services. For example,there may be situations in which bus routes are not substituted for rail routes even when the bus route is faster because passengers may value frequency over faster travel times. 


To measure substitution, we used the 2015-2017 Rider Census data, which includes information about the most recent journey survey respondents took using the MBTA system. We categorized each journey by its starting mode, or the type of service used at the start of the respondent’s journey, and its ending mode, or the type of service used at the end of the respondent’s journey. We defined four categories for the starting mode and ending modes: commuter rail, bus, light rail (the Green and Mattapan lines), and heavy rail (the Red, Orange, and Blue Lines). This resulted in each journey being assigned to one of 16 categories. To give an example, for a passenger who begins their journey at Lynn, takes the commuter rail to North Station, transfers to the Green Line and finishes their journey at Prudential, the journey would be classified as “Commuter Rail to Light Rail.”

While the survey data provided helpful insights on clustering and completed journeys, we had to account for undersampled evening commutes in the data set. We assumed that the trips from point A to point B by morning commuters are duplicated as trips from point B to point A by those same commuters in the evening, assuming that passengers use the same MBTA service for both commutes.

We then used the k-nearest-neighbors algorithm for each journey in the 2015-17 Rider Census to select the ten most similar origin-destination pairs. We determined similarity on the basis of a passenger’s origin and destination locations. The origin location would be the latitude-longitude coordinates of the street intersection nearest to the passenger’s home, and the destination location would be the latitude-longitude coordinates of the street intersection nearest to their workplace. The ten most similar journeys were determined by using four-dimensional Euclidian distance which are the longitude and latitude of the passenger’s origin point and the longitude and latitude of the passenger’s destination point. We calculated the percentage of the ten most similar journeys that belonged to the same category. That measure is the propensity for substitution.Using the same origin-destination pairs, if journeys among passengers varies greatly, the substitution percentage approaches 100%. If journeys do not vary, the percentage approaches 0%.

Next, we mapped the substitution metric in QGIS. The survey data was converted to a spatial point data set, with the location of the point determined by the latitude-longitude coordinate of the origin location. We duplicated the survey data while reversing the origin locations and the destination locations, effectively mapping every journey as two points: one representing the origin location and the other representing the destination location. Adjacent points were grouped into 500m hexagons, and the average propensity for substitution was calculated for each hexagon. At 100%, the ten nearest neighbors of journeys that started and ended in that hexagon were taken using the same MBTA service, on average. Alternatively, at 0%,the ten nearest neighbors of journeys that started and ended in that hexagon were taken using the different MBTA services.



A few interesting trends are shown in the substitution map above. Immediately beyond the terminal stations of the Red and Orange lines, the metric approaches 0%; this is probably because some passengers choose to walk to the Red and Orange line stations, while other passengers choose to take a bus. Many passengers choose to take other MBTA services rather than walk near terminal stations that have large average walk distances, since this walk distance is less acceptable for different people. Another interesting observation is that substitution near Andrew and Broadway, the two Red line stations that serve South Boston, is relatively low; this is most likely because passengers are choosing to take one of the many bus routes rather than the Red Line. In fact, the eastern half of South Boston has a cluster of hexagons with percentages over 80%, meaning that the bus route is practical enough that passengers forego the walk to Broadway or Andrew.

To illustrate the usefulness of this approach, we conducted an analysis focused specifically on South Boston. Five bus lines converge on City Point at the edge of South Boston: Routes 5, 7, 9, 10, and 11. We filtered the survey data to identify trips that started or ended with one of those bus lines (n=696), and since the survey data is biased towards morning trips, duplicated the survey data while flipping the starting and ending locations. We then applied the same k-nearest-neighbors algorithm to the data, and mapped the data using the same procedure. The resulting data showed the same cluster around City Point where all five of the bus lines converge.

Subsequently, we grouped the individual points using the k-nearest neighbor algorithm in to twenty clusters. The four variables we used to cluster the data were the origin location latitude, origin location longitude, destination location latitude, and destination location longitude. We filtered out the clusters with less than 20 data points, leaving twelve clusters, which enabled us to identify unusual trip patterns and ignore them. For each usable cluster, we calculated the average substitution percent and plotted the clusters as lines, with the endpoints of the lines representing the average origin and destination locations of passenger journeys in that particular cluster.

The resulting map illustrates that passengers using the bus network, whose journeys start or end near the western portion of South Boston, typically use the same bus route. Passengers whose journeys begin near Andrew or Broadway, however, use different bus routes to get to serve the same journey. This is potentially a sign that some of the bus routes in South Boston could be consolidated without substantially impacting passenger experience.


In the last two posts, we have used the Rider Census data set to examine how people access transit in greater detail than is usually possible. First, we found that the distance traveled to access transit on foot varies much more than the commonly applied rule of thumb of ½ mile. In this post, we found that people, perhaps unsurprisingly, use different transit services when they have multiple options. Importantly, we do not know from this analysis if an individual might choose different services on different days, nor the reasons why they might choose one service over another. Future analysis can examine these questions, using this and other survey data.

To use the MBTA, passengers typically have to walk, drive, or otherwise travel between our stations and their homes, offices, and schools. The question of how passengers travel between stations and their ultimate origin or destination is called the “last mile problem.” Typically, when the MBTA tries to answer questions involving the last mile problem (e.g., determining how many jobs are within walking distance of T stations), we assume that passengers won’t walk more than half a mile. However, studies of walking distances of different subway networks have found that walk distances vary considerably from station to station. In this blog post, we are going to explore how walk distances may vary from station to station in our MBTA network. 

For this post, we’re using survey answers from our most recent Rider Census, where passengers were asked to provide information about their most recent trip on the MBTA, including the location of their origin and destination. This provides us an opportunity to calculate how far passengers walk between their ultimate origins and destinations and MBTA stations. For each rail station and Silver Line station, as well as for each bus line, we used bootstrapping to calculate a confidence interval for the average distance passengers walk to and from MBTA stops. We then focused our analysis on the Red and Orange Lines, and identified three interesting trends: passengers walked longer distances to reach stations at the ends of the Red and Orange Lines, passengers walked shorter distances to stations constrained by bodies of water, and passengers walked shorter distances to stations in the middle of the Orange Line. 

Methodology and Data Sources

As mentioned, the MBTA and CTPS recently conducted a systemwide passenger survey. For the survey, we asked passengers about their most recent trip on the MBTA. The survey asked passengers to list their origin and destination locations—where they are coming from before arriving at a MBTA stop/station and where they are traveling to after completing their trip on the MBTA. They were able to classify these locations in a variety of ways, like home, workplace, school, etc. The survey then asked passengers to list their mode of travel (driving, walking, biking, or use of a non-MBTA service) when going to and from the T in order to learn more about this “last mile.” Passengers listed the specific MBTA service they used (e.g. Green Line, bus route 7, Fitchburg Line, etc.) and at what specific stops they boarded and alighted. Passengers also provided basic demographic information.

Not every respondent provided an origin or destination location, so we separated the dataset into two groups: responses that included an origin location, and responses that included a destination location. (Responses that included both an origin and destination location were counted twice.) Since we are investigating walkability, we filtered the datasets so that they only contained responses from passengers who identified their access and egress modes as walking. This left 15,934 responses from passengers who identified an origin location and walked to their first MBTA boarding and 18,161 responses from passengers who identified a destination location and walked from their last MBTA alighting.

For each of the responses, we calculated the walk distance by calculating the straight line distance in meters from origin and destination locations to the location where they boarded or exited their first or last MBTA service experience. In cases where passengers were using rail or Silver Line service, the survey identified the exact stop at which passengers boarded and exited the service. However, in cases where passengers were using bus service, the survey did not identify the exact stop at which passengers boarded and exited; the survey only identified the bus line that passengers took. Therefore, we assumed that bus passengers would walk to the bus stop closest to their origin or destination location, and used the bus stop nearest to the passenger’s origin or destination location to calculate the walk distance.

Finally, we filtered out stops and bus lines that had less than thirty data points. The Green and Blue Lines did not have a lot of stations with more than thirty data points, whereas the Red and Orange Line stations all had more than thirty data points each . Therefore, we decided to focus on the Red and Orange Lines for the purposes of this blog post. We mapped the mean and median walk distances for the Red and Orange Lines in QGIS (we did not map the walk distances for Downtown Crossing, as that station is shared by the Red and the Orange Line).

Possible limitations of the data include:

  • The number of responses for each station and line are not proportional to the ridership of the respective stations/lines.
  • Women, English speakers, and regular MBTA passengers were more likely to respond to the survey.
  • Because passengers were asked to describe their most recent trip, the survey responses were often biased towards trips taken in the morning.


Line Station Number Datapoints Mean Walk Distance Mean Lower CI Mean Upper CI Median Walk Distance Median Lower CI Median Upper CI
Orange Line Assembly Station 57 513.657 437.840 587.424 320.230 200.572 320.230
Orange Line  Back Bay  388  497.198 359.258 603.427 331.830 311.550 333.988
Orange Line  Downtown Crossing  387  462.723 366.399 545.144 301.413 286.933 330.088
Orange Line  Forest Hills  119  712.659 578.353 839.518 429.573 325.896 507.278
Orange Line  Malden Center  151  718.944 527.984 852.199 561.643 535.360 579.449
Orange Line Mass Ave  221  466.500 382.312 532.923 261.808 199.563 261.808
Orange Line North Station  303  558.156 299.258 709.093 272.793 234.126 272.793
Orange Line Oak Grove 77  1142.579 778.101 1446.468 716.906 647.668 904.608
Orange Line Sullivan Square 90 674.677 547.066 786.468 432.475 334.869 433.871
Red Line Alewife 188 832.947 647.254 980.625 658.636 558.152 727.017
Red Line Charles MGH 934 261.047 238.093 280.972 175.378 140.958 175.378
Red Line Davis Square 462 787.405 493.758 961.490 585.864 544.225 650.085
Red Line Downtown Crossing 501 526.546 412.159 621.614 306.902 290.354 339.374
Red Line Kendall Square 1218 421.666 400.187 441.639 315.829 315.829 315.829
Red Line South Station 684 424.869 354.815 484.151 264.058 219.488 282.041


Confidence Interval = CI

* Click the link to view the above table with more stations listed.An image of the mean walk distances for the Red and Orange Lines.

An image of the median walk distances for the Red and Orange Lines.


There are a number of interesting conclusions that can be drawn from the mean and median walk distances from each station. We have tried classifying them into a few main trends as explained below:

Physical Landscape — Safety & Geography

The Charles MGH station is notable for having a substantially lower median and mean walk distance compared to the other Red Line stations. There are a few possible explanations for this. First, the built environment of Charles MGH is particularly inconvenient to pedestrians: the station has only two crosswalks, two entrances, and is surrounded by busy roads. Pedestrians are also constrained by two geographic features--Beacon Hill (the hill, not the neighborhood) and the Charles River—which could limit how far pedestrians are able to walk to reach Charles MGH. Kendall Square, which has the fourth lowest mean walking distance out of all Red Line stations, also is adjacent to the Charles River, which provides further evidence for bodies of water like the Charles River affecting station walkability. A similar effect can be seen at Assembly Station on the Orange Line, which is surrounded by the Mystic River and Interstate 93, and has a lower average and median walkshed than the adjacent stations (Sullivan Square and Malden).

Last Stations on Subway Lines

Stations at the ends of the Red and Orange Lines—Alewife, Davis, Forest Hills, Malden, and Oak Grove—tend to have larger average and median walk distances. This is probably because passengers who live beyond the reach of the Red and Orange Lines prefer the Red and Orange Lines to alternative MBTA services (the bus network and the commuter rail), and are willing to walk further distances to reach the Red and Orange Lines. 


The stations at the center of the Orange Line—beginning at around Mass Ave and ending at around North Station—tend to have lower medians and means compared to other Orange Line stations. There are a few possible explanations for this. First, this section of the Orange Line is not only very close to the E branch of the Green line, but also runs parallel to it. This means that passengers can choose between the E branch and the Orange line, and it’s likely that one of the factors that goes into that decision is which line has the closest stations, so passengers are likely minimizing their walking distances during that section of the Orange Line. Another factor is that Orange Line stations in the center of the Orange Line are particularly close together, which could affect how far passengers need to walk to reach an Orange Line station.

Transportation is responsible for a significant chunk of the carbon emissions that is causing climate change. The IPCC has found that approximately one-quarter of global CO2 emissions in 2014 were from the transportation sector, and that this sector has seen faster emissions growth than any other. Public transit is one of many solutions that can help us reduce our collective transportation emissions. Trains and buses lower emissions because they can efficiently move many people at once. Additionally, the more priority (bus lanes, transit signal priority, etc.) that we are able to give buses in particular, the larger the emissions savings and the better the experience for our riders. 

The MBTA, in partnership with policy makers, municipalities, and businesses, has a very important role to play in making it easier for people to make sustainable transportation choices. One of the ways that the T works to get people out of single occupancy vehicles and onto trains and buses is through our Perq program, formerly known as the Corporate Pass Program. Through Perq, employers can offer pretax or subsidized monthly passes to their employees. Perq is a way for employers to incentivize their employees’ to replace vehicle trips with transit for their daily commute. According to AASHTO, work trips make up 19% of all person miles traveled in the U.S. However, access to a subsidized transit pass increases your likelihood of taking transit for other, non-commute trips as well. Employees and employers have pointed out that the “ease-of-use” aspect of employer-provided MBTA passes, in addition to the cost savings, also increase the likelihood that employees will use transit to commute. 

We know that mass transit has a lower carbon footprint than driving in most situations, and having convenient access to an MBTA pass can make the choice between driving and taking transit a little bit simpler. But, just how big of an effect can employer-provided transit passes actually have on emissions? To try to answer to this question, we partnered with a large education company based in Cambridge that participates in the Perq program. We looked into their employees’ transit patterns in aggregate to try to estimate just how many emissions they are potentially offsetting. 

What trips are employees making in the first place?

In order to know what emissions are being saved (or generated) through this company’s transit pass program, we first have to know which trips employees are actually making. How are they traveling? On which modes? Trip lengths and mode choices can significantly change the environmental benefits of transit.

To begin, we identified all Perq CharlieCards in use by employees in September of 2018, the last month for which we have complete, processed trip data. We ran those card numbers through our ODX model in order to identify every trip (an origin station and a destination station) that was taken that month. Our final dataset included each unique origin-destination (OD) pair, how many times that trip was taken, and whether the primary mode used was subway or bus.  

This analysis is structured to protect riders' privacy by keeping trips anonymous – our final model contains only OD pairs and modes and removes any personal CharlieCard information. 

We ran each OD pair identified by the ODX model through the Google Distance Matrix API in order to get the distance, in miles, for each trip if it was taken using transit, and then again for the driving equivalent. For example, the most common trip taken in this dataset was from Community College to Oak Grove on the Orange Line. This trip is approximately 4.6 miles on the train, but nearly double that (8.7 miles) when driving. With the ODX and API data, we are able to calculate the total number of passenger miles taken on bus, subway, and equivalent driving trips. 

The Perq program, however, does not just facilitate bus and subway trips. It also allows employers to provide their employees with Commuter Rail passes. Because the Commuter Rail does not use any automated fare collection systems, the only information that we have about how employees used Commuter Rail is how many passes were purchased for each zone. There is no way to identify particular stations, particular lines, or trip frequencies for these employees. Additionally, zone passes are not assigned specific employees in our data, so this analysis is inherently anonymous.

We do know more generally how Commuter Rail riders behave, through a variety of surveys, including a monthly panel surveys and biannual Keolis passenger surveys. While it can be difficult to relate reported behavior and actual behavior, we made assumptions about the average behavior of a Commuter Rail rider, and that employees who choose to commute on the Commuter Rail behave similarly. The vast majority of trips taken on the Commuter Rail are in the peak direction and include the terminal station of the line (either North Station or South Station). This is likely particularly true for employees given the location of their office near North Station. In addition, Commuter Rail riders that use the service for work, tend not to use transit for other trip purposes. 

In order to estimate trip distances for employees that ride Commuter Rail, we found the average transit and driving distances for all stations within a zone to their respective terminal stations. Driving mileage was calculated using the Google API, whereas we used track distances to determine the transit mileage. Take for example, the Zone 7 stations in the table below. We identified the transit and driving distances between each Zone 7 station and the terminal station on its line. From those distances, we determined the average transit and driving mileage from a Zone 7 station to downtown Boston. There were two employees who received Zone 7 Commuter Rail passes through Perq, and so we assumed that their trip lengths were equal to the zone average. 

Stop Name Terminal Station Transit Distance (Miles) Driving Distance (Miles)
Bradford North Station 32.5 37.4
Gloucester North Station 31.6 36.6
Haverhill North Station 32.9 37.1
Littleton/Rt 495 North Station 30.1 36.1
Rowley North Station 31.2 35.8
West Gloucester North Station 29.6 34.9
Attleboro South Station 31.8 39.4
Halifax South Station 28.1 35.4
South Attleboro South Station 36.8 44.5
Westborough South Station 34.0 36.0
  ZONE 3 AVERAGE 31.9 37.3

For these trips, we assumed that employees, on average, took the commuter rail every workday except one (in September 2018 that translates to 18 round trips). 

By the end of this process, we have calculated: 

1) total passenger miles on buses, 

2) total passenger miles on subway, 

3) total passenger miles on Commuter Rail, and 

4) total passenger miles driven for the equivalent of all transit trips. 

Total Passenger Miles, by mode  

Mode Total Estimated Passenger Miles, September 2018
Transit 30,255
Bus 3,775
Subway 16,056
Commuter Rail 10,444
Driving 41,695

How do those trips translate to emissions?

Estimating carbon emissions that result from different modes of transportation is a difficult process because there are so many confounding factors. Not only does the mode matter, but the age of a vehicle, the speed at which it is traveling, the condition of the road or track, and so much more can impact the emissions released. 

Thankfully, the Massachusetts Department of Environmental Protection has done a lot of this thinking already. MassDEP developed a carbon emissions calculator that takes into account the unique transportation landscape in Massachusetts to estimate emissions factors for each mode. The tool is slightly out of date, with some of the data being sourced from 2012. We are working to update the tool and come up with more accurate emissions factors, but as of now, the calculator is the most reliable source specific to our region. 

To get total carbon emissions, we ran the total estimated passenger miles by mode through the calculator. 


After running the DEP calculator, we can compare multiple scenarios to see just how many emissions the Perq program is helping to offset. 

Spoiler Alert: Companies who encourage transit use can save A LOT of carbon emissions.  

The number we are relatively sure of is just how many passenger miles were ridden on transit, which the DEP calculator reports as having generated 9,810 pounds of CO2 in September of 2018. What we are less sure of is just how many of those trips replaced car trips. If we assume that every transit trip replaced a driving trip, we could estimate that 41,705 pounds of CO2 would have been generated by driving — this means that riders who took transit reduced their emissions by up to 76% 

Realistically, we know that not all transit trips replaced car trips. Some were likely biked, walked, already taken on transit, or not taken at all, meaning that these trips were either net neutral or actually generated emissions. Without a company specific travel survey, it is difficult to know exactly how employees behaved before the Perq pass, so we had to make some assumptions. To create a low bound estimate, we took MassDOT’s definition of a “bikeable distance,” and assumed that for every trip under six miles, the Perq program actually generated emissions. In this case, the comparison point for driving emissions is 24,977 pounds of CO2. Even in this fairly absurd low bound scenario (Was everyone who was traveling six miles or less biking or walking? Probably not.), riders who took transit still saw an overall emissions reduction of 60%. In the table below, you can see the amount of CO2 generated by driving in a variety of scenarios. 

Scenario Lbs of CO2 from driving
All trips would have happened in a car 41,705
All trips over one mile would have happened in a car 41,453
All trips over one and a half miles would have happened in a car 40,882
All trips over two miles would have happened in a car 39,970
All trips over four miles would have happened in a car 31,476
All trips over six miles would have happened in a car 24,977

What is a pound of CO2 after all? Are these savings very big? Try putting in some of these values into the EPA emissions equivalencies calculator below. Remember that these savings are for just one company in one month – the Perq program works with approximately 1,500 companies and is constantly growing. 


Transportation is the biggest contributor to carbon emissions in the state of Massachusetts, contributing 43% of the state’s total emissions, a share higher than the US and global averages. Replacing as many single occupancy vehicle trips by transit or active modes is one of the most effective ways we can reduce our carbon emissions. Employers have an important role to play in reducing the carbon footprint of commuters, and many who currently partner with the MBTA are thinking of innovative solutions to do just that. More information about the Perq program – one of many solutions for employers looking to lower their carbon footprint – is available here