Behind the scenes with MBTA data.

Here at the MBTA, we keep a close eye on ridership trends and are always working on ways to better collect and visualize ridership data. After the unfortunate derailment on the Red Line in June that drastically affected service throughout the summer, we used Tableau to explore how ridership on the line had been impacted. This post will examine some of the changes in ridership we saw in the period following the derailment as normal service was restored.

The Data

Readers of the blog and transit data enthusiasts will remember that “ridership” does not mean any particular measure, and the ridership we report to the public is estimated based on multiple data sources and historically-based factors. To examine the effects of the derailment, we used card and ticket validations (referred to as “taps” in this post for simplicity) at the gates of stations that serve the Red Line. The data showing taps at gates should record the majority of people passing through them reliably, are comparable to other time periods going back to 2013, and are available at a very granular level. For stations where passengers can board multiple lines (in this case, Downtown Crossing and Park Street), we used the same “split” factors that we used for the dashboard and other reporting to assign a portion of their entries to the Red Line. We did not use any factor to estimate non-interaction; our assumption throughout this post is that non-interaction was roughly static throughout the time periods examined. We also did not attempt to account for passengers who board the Red Line via transfer from another line. It seems possible that fewer passengers would transfer from other lines than usual given the reduced service levels, but we had no reasonable way to measure this.

To explore the data, we queried our research database for all taps on all gates, grouping the data into 30-minute periods and adding attributes for the service date (measured from 3 AM until 2:59 AM the next morning), the type of service in effect for that date (Weekday, Saturday, Sunday, or Holiday), the station and other characteristics about the taps. Once this dataset was built, we loaded it into Tableau and built some views to start exploring.

Service was affected differently in different parts of the Red Line. The initial derailment damaged the signal bunkers housed at JFK/UMass station and forced the Red Line to operate in manual block mode. Automatic signaling was fully restored from JFK/UMass north to Alewife on July 31, but automatic signaling was not restored on the Ashmont branch until September 11, and was not completely restored on the entirety of the Red Line until September 23. We also know that many people traveling from either Red Line branch do not frequently use the Red Line north of downtown, and similarly, that many people traveling from stations north of downtown are only going as far as downtown, or even just to Kendall. To examine how these geographically-distributed service impacts have affected ridership spatially, we divided the boardings into five groups based on area of the city. These groups were:


 Area  Stations Included
 Cambridge / Somerville  Alewife, Davis, Porter, Harvard,       Central, Kendall / MIT
 Downtown  Charles / MGH, Park Street,       Downtown Crossing, South Station 
 South Boston  Broadway, Andrew
 Dorchester  JFK / UMass, Savin Hill, Fields       Corner, Shawmut, Ashmont
 Quincy / Braintree  North Quincy, Wollaston, Quincy       Center, Quincy Adams, Braintree


The Results

To get an idea of longer-term trends on the Red Line, we put together the following chart, which shows daily weekday taps on the Red Line (all stations) over the last two years, with a 20-day moving average that smooths the data to show trends. You can see that we generally have a big dip in ridership in December (Holidays like Thanksgiving and Christmas where we run reduced service are excluded, but we see lower ridership on the weekdays surrounding them). You can also see that we generally have our highest ridership from late September through October when school is in session and there are few breaks in most people’s schedules. You can also see lower ridership in March 2018, when there were a number of storms that closed schools and otherwise affected ridership. Finally, you can see the drop in ridership over this summer likely due to the impacts of the derailment. Ridership is usually low in the week around 4th of July, and towards the end of August, but a decrease can be seen this summer right around June 11 (the day of the derailment) and while July had some higher-ridership days, the overall ridership was about 5% lower than last summer. 


Chart of Total Taps with Moving Average

As expected, ridership was less affected in places where service was less affected. Here are some views of the above chart, filtering to just the taps at the Cambridge / Somerville and Quincy / Braintree stations as grouped above. First, the charts show the last 15 months (showing the last two summers) with a 20-day moving average, then they show taps at the stations since May 2019, with a 10-day moving average. We chose the Quincy and Cambridge stations as they had the greatest difference in service as well as the greatest difference in ridership.

 A chart of the Quincy / Braintree branch with moving average

 A zoomed-in version of the chart of the Quincy / Braintree branch with moving average


 A zoomed-in version of the chart of the Cambridge section with moving average


A zoomed-in version of the chart of the Cambridge section with moving average

You can see from the above charts that in Cambridge and Somerville, ridership returned close to its previous levels quite soon after the derailment, and with the exception of the 4th of July week, remained at this level until the last couple weeks of August. In Quincy and Braintree, however, ridership did not rebound to the same level, and this drop continued for the remainder of the summer. 

We took a look at the median weekday ridership compared to the previous year in each of the areas. The time periods here are divided into three: January 1 – May 31, June 1 – August 31, and September. For September 2019, the data is complete through September 27.


  Change in Median Weekday Ridership
 Area  January-May   June-August   September 
 Cambridge /   Somerville  2.0%  -1.3%  -1.7%
 Downtown  1.2%  -3.0%  -2.1%
 South Boston  -1.5%  -7.8%  -4.9%
 Dorchester  0.4%  -9.3%  -3.3%
 Quincy /         Braintree  -3.9%  -11.9%  -3.1%


In the first 5 months of 2019, ridership was generally steady or up slightly compared to the previous year. While part of this is attributed to the low ridership in March 2018 due to snow, we chose to use the median here to mitigate the effect of such days (as well as the abnormally high ridership on February 5, 2019 due to the Patriots’ championship parade). The exception to this trend was the Braintree branch, where ridership was down nearly 4%. While Wollaston station was closed in both time periods, this is likely to due to increasing construction impacts from various projects along the branch, or perhaps due to people switching to Commuter Rail in the area.

After the derailment, we saw more disparate impacts. The Dorchester and Braintree branches saw the biggest drop in median ridership, likely because service was affected there the most and also because those areas have in higher levels of car ownership (in the case of Quincy residents) and more alternate routes to downtown. Since Wollaston station re-opened, we might have expected a greater increase in Quincy and Braintree; however, we looked at the data and noticed that most Wollaston riders seemed to switch to North Quincy while Wollaston was closed (ridership at Wollaston, North Quincy and Quincy Center combined did not significantly change after Wollaston reopened). Ridership in Cambridge and Somerville barely dropped at all compared to the previous summer, which is likely an effect of service being better and there being fewer alternate routes: Passengers could switch to the Commuter Rail at Porter, but if they were going somewhere for which that trip was convenient, they probably were taking the Fitchburg line already. Downtown ridership was down, but that is likely largely a product of the ridership in the other areas.

So far in September, we have seen ridership much closer to last September than over the summer, but in most areas, we are down a few percent. Some of this is due to missing data for the last few days of September – we tend to see higher ridership at the end of September than at the beginning. To be sure, we also took a look at the median ridership through the first 13 non-holiday weekdays of each month, as well as the averages. The medians were very close to the average ridership, and through the first 13 days, the changes between the median ridership in the two months were similar, as shown above.

Last month, as people returned from vacations and went back to school, ridership (as measured by taps at stations) had rebounded on the Red Line compared to the summer, and overall is down 2.5% from last September. In Cambridge and Somerville, where service was least impacted by the derailment, ridership is nearly the same as last September and was only slightly down during the summer. In Braintree and Quincy, ridership is down nearly 4 percent, but there were still significant service impacts in this area into September. In South Boston and Dorchester, ridership is also down even though service is largely restored. It is possible that usual riders may have switched to another service or mode, and either may have found that this new method serves their trip better, or may not be aware that service has been restored. We will continue to watch ridership at these stations now that full service is restored and we move into our usual high ridership month of October.


In our previous post about passenger walk distances, we used the Rider Census to examine how accessible transit is to its users and found that passengers walked further than the assumed half-mile to stations at the ends of the Red and Orange Lines, while they walked less than this to stations in the center of our region. Our main conclusion, which is perhaps obvious, was that the structure of the network itself has a large impact on how passengers interact with the network.

We wanted to use this data set to look at passengers’ entire journeys rather than just their access point. To do so, we developed a metric we call “substitution propensity.” In a transportation network, each station is only attractive for a set number of destinations. For example, Savin Hill is a station on the Red Line, so Savin Hill is useful for trips north to downtown Boston. However, for trips west to Ruggles or Dudley Square, Savin Hill is not as useful; it’s likely that people would walk to the nearby stop for the 15 bus instead. In other cases, two nearby stations might serve very similar journeys: for example, much of the E branch of the Green Line and the Orange Line run nearly parallel to each other.

Substitution, as it relates to walkability, is defined here as the propensity at which passengers exclusively choose a particular route over other nearby alternative routes. Substitution explains differences in how passengers choose to access MBTA services: passengers will walk for longer distances in areas in which there are fewer service options. This is also a useful metric for determining what qualities passengers value in MBTA services. For example,there may be situations in which bus routes are not substituted for rail routes even when the bus route is faster because passengers may value frequency over faster travel times. 


To measure substitution, we used the 2015-2017 Rider Census data, which includes information about the most recent journey survey respondents took using the MBTA system. We categorized each journey by its starting mode, or the type of service used at the start of the respondent’s journey, and its ending mode, or the type of service used at the end of the respondent’s journey. We defined four categories for the starting mode and ending modes: commuter rail, bus, light rail (the Green and Mattapan lines), and heavy rail (the Red, Orange, and Blue Lines). This resulted in each journey being assigned to one of 16 categories. To give an example, for a passenger who begins their journey at Lynn, takes the commuter rail to North Station, transfers to the Green Line and finishes their journey at Prudential, the journey would be classified as “Commuter Rail to Light Rail.”

While the survey data provided helpful insights on clustering and completed journeys, we had to account for undersampled evening commutes in the data set. We assumed that the trips from point A to point B by morning commuters are duplicated as trips from point B to point A by those same commuters in the evening, assuming that passengers use the same MBTA service for both commutes.

We then used the k-nearest-neighbors algorithm for each journey in the 2015-17 Rider Census to select the ten most similar origin-destination pairs. We determined similarity on the basis of a passenger’s origin and destination locations. The origin location would be the latitude-longitude coordinates of the street intersection nearest to the passenger’s home, and the destination location would be the latitude-longitude coordinates of the street intersection nearest to their workplace. The ten most similar journeys were determined by using four-dimensional Euclidian distance which are the longitude and latitude of the passenger’s origin point and the longitude and latitude of the passenger’s destination point. We calculated the percentage of the ten most similar journeys that belonged to the same category. That measure is the propensity for substitution.Using the same origin-destination pairs, if journeys among passengers varies greatly, the substitution percentage approaches 100%. If journeys do not vary, the percentage approaches 0%.

Next, we mapped the substitution metric in QGIS. The survey data was converted to a spatial point data set, with the location of the point determined by the latitude-longitude coordinate of the origin location. We duplicated the survey data while reversing the origin locations and the destination locations, effectively mapping every journey as two points: one representing the origin location and the other representing the destination location. Adjacent points were grouped into 500m hexagons, and the average propensity for substitution was calculated for each hexagon. At 100%, the ten nearest neighbors of journeys that started and ended in that hexagon were taken using the same MBTA service, on average. Alternatively, at 0%,the ten nearest neighbors of journeys that started and ended in that hexagon were taken using the different MBTA services.



A few interesting trends are shown in the substitution map above. Immediately beyond the terminal stations of the Red and Orange lines, the metric approaches 0%; this is probably because some passengers choose to walk to the Red and Orange line stations, while other passengers choose to take a bus. Many passengers choose to take other MBTA services rather than walk near terminal stations that have large average walk distances, since this walk distance is less acceptable for different people. Another interesting observation is that substitution near Andrew and Broadway, the two Red line stations that serve South Boston, is relatively low; this is most likely because passengers are choosing to take one of the many bus routes rather than the Red Line. In fact, the eastern half of South Boston has a cluster of hexagons with percentages over 80%, meaning that the bus route is practical enough that passengers forego the walk to Broadway or Andrew.

To illustrate the usefulness of this approach, we conducted an analysis focused specifically on South Boston. Five bus lines converge on City Point at the edge of South Boston: Routes 5, 7, 9, 10, and 11. We filtered the survey data to identify trips that started or ended with one of those bus lines (n=696), and since the survey data is biased towards morning trips, duplicated the survey data while flipping the starting and ending locations. We then applied the same k-nearest-neighbors algorithm to the data, and mapped the data using the same procedure. The resulting data showed the same cluster around City Point where all five of the bus lines converge.

Subsequently, we grouped the individual points using the k-nearest neighbor algorithm in to twenty clusters. The four variables we used to cluster the data were the origin location latitude, origin location longitude, destination location latitude, and destination location longitude. We filtered out the clusters with less than 20 data points, leaving twelve clusters, which enabled us to identify unusual trip patterns and ignore them. For each usable cluster, we calculated the average substitution percent and plotted the clusters as lines, with the endpoints of the lines representing the average origin and destination locations of passenger journeys in that particular cluster.

The resulting map illustrates that passengers using the bus network, whose journeys start or end near the western portion of South Boston, typically use the same bus route. Passengers whose journeys begin near Andrew or Broadway, however, use different bus routes to get to serve the same journey. This is potentially a sign that some of the bus routes in South Boston could be consolidated without substantially impacting passenger experience.


In the last two posts, we have used the Rider Census data set to examine how people access transit in greater detail than is usually possible. First, we found that the distance traveled to access transit on foot varies much more than the commonly applied rule of thumb of ½ mile. In this post, we found that people, perhaps unsurprisingly, use different transit services when they have multiple options. Importantly, we do not know from this analysis if an individual might choose different services on different days, nor the reasons why they might choose one service over another. Future analysis can examine these questions, using this and other survey data.

To use the MBTA, passengers typically have to walk, drive, or otherwise travel between our stations and their homes, offices, and schools. The question of how passengers travel between stations and their ultimate origin or destination is called the “last mile problem.” Typically, when the MBTA tries to answer questions involving the last mile problem (e.g., determining how many jobs are within walking distance of T stations), we assume that passengers won’t walk more than half a mile. However, studies of walking distances of different subway networks have found that walk distances vary considerably from station to station. In this blog post, we are going to explore how walk distances may vary from station to station in our MBTA network. 

For this post, we’re using survey answers from our most recent Rider Census, where passengers were asked to provide information about their most recent trip on the MBTA, including the location of their origin and destination. This provides us an opportunity to calculate how far passengers walk between their ultimate origins and destinations and MBTA stations. For each rail station and Silver Line station, as well as for each bus line, we used bootstrapping to calculate a confidence interval for the average distance passengers walk to and from MBTA stops. We then focused our analysis on the Red and Orange Lines, and identified three interesting trends: passengers walked longer distances to reach stations at the ends of the Red and Orange Lines, passengers walked shorter distances to stations constrained by bodies of water, and passengers walked shorter distances to stations in the middle of the Orange Line. 

Methodology and Data Sources

As mentioned, the MBTA and CTPS recently conducted a systemwide passenger survey. For the survey, we asked passengers about their most recent trip on the MBTA. The survey asked passengers to list their origin and destination locations—where they are coming from before arriving at a MBTA stop/station and where they are traveling to after completing their trip on the MBTA. They were able to classify these locations in a variety of ways, like home, workplace, school, etc. The survey then asked passengers to list their mode of travel (driving, walking, biking, or use of a non-MBTA service) when going to and from the T in order to learn more about this “last mile.” Passengers listed the specific MBTA service they used (e.g. Green Line, bus route 7, Fitchburg Line, etc.) and at what specific stops they boarded and alighted. Passengers also provided basic demographic information.

Not every respondent provided an origin or destination location, so we separated the dataset into two groups: responses that included an origin location, and responses that included a destination location. (Responses that included both an origin and destination location were counted twice.) Since we are investigating walkability, we filtered the datasets so that they only contained responses from passengers who identified their access and egress modes as walking. This left 15,934 responses from passengers who identified an origin location and walked to their first MBTA boarding and 18,161 responses from passengers who identified a destination location and walked from their last MBTA alighting.

For each of the responses, we calculated the walk distance by calculating the straight line distance in meters from origin and destination locations to the location where they boarded or exited their first or last MBTA service experience. In cases where passengers were using rail or Silver Line service, the survey identified the exact stop at which passengers boarded and exited the service. However, in cases where passengers were using bus service, the survey did not identify the exact stop at which passengers boarded and exited; the survey only identified the bus line that passengers took. Therefore, we assumed that bus passengers would walk to the bus stop closest to their origin or destination location, and used the bus stop nearest to the passenger’s origin or destination location to calculate the walk distance.

Finally, we filtered out stops and bus lines that had less than thirty data points. The Green and Blue Lines did not have a lot of stations with more than thirty data points, whereas the Red and Orange Line stations all had more than thirty data points each . Therefore, we decided to focus on the Red and Orange Lines for the purposes of this blog post. We mapped the mean and median walk distances for the Red and Orange Lines in QGIS (we did not map the walk distances for Downtown Crossing, as that station is shared by the Red and the Orange Line).

Possible limitations of the data include:

  • The number of responses for each station and line are not proportional to the ridership of the respective stations/lines.
  • Women, English speakers, and regular MBTA passengers were more likely to respond to the survey.
  • Because passengers were asked to describe their most recent trip, the survey responses were often biased towards trips taken in the morning.


Line Station Number Datapoints Mean Walk Distance Mean Lower CI Mean Upper CI Median Walk Distance Median Lower CI Median Upper CI
Orange Line Assembly Station 57 513.657 437.840 587.424 320.230 200.572 320.230
Orange Line  Back Bay  388  497.198 359.258 603.427 331.830 311.550 333.988
Orange Line  Downtown Crossing  387  462.723 366.399 545.144 301.413 286.933 330.088
Orange Line  Forest Hills  119  712.659 578.353 839.518 429.573 325.896 507.278
Orange Line  Malden Center  151  718.944 527.984 852.199 561.643 535.360 579.449
Orange Line Mass Ave  221  466.500 382.312 532.923 261.808 199.563 261.808
Orange Line North Station  303  558.156 299.258 709.093 272.793 234.126 272.793
Orange Line Oak Grove 77  1142.579 778.101 1446.468 716.906 647.668 904.608
Orange Line Sullivan Square 90 674.677 547.066 786.468 432.475 334.869 433.871
Red Line Alewife 188 832.947 647.254 980.625 658.636 558.152 727.017
Red Line Charles MGH 934 261.047 238.093 280.972 175.378 140.958 175.378
Red Line Davis Square 462 787.405 493.758 961.490 585.864 544.225 650.085
Red Line Downtown Crossing 501 526.546 412.159 621.614 306.902 290.354 339.374
Red Line Kendall Square 1218 421.666 400.187 441.639 315.829 315.829 315.829
Red Line South Station 684 424.869 354.815 484.151 264.058 219.488 282.041


Confidence Interval = CI

* Click the link to view the above table with more stations listed.An image of the mean walk distances for the Red and Orange Lines.

An image of the median walk distances for the Red and Orange Lines.


There are a number of interesting conclusions that can be drawn from the mean and median walk distances from each station. We have tried classifying them into a few main trends as explained below:

Physical Landscape — Safety & Geography

The Charles MGH station is notable for having a substantially lower median and mean walk distance compared to the other Red Line stations. There are a few possible explanations for this. First, the built environment of Charles MGH is particularly inconvenient to pedestrians: the station has only two crosswalks, two entrances, and is surrounded by busy roads. Pedestrians are also constrained by two geographic features--Beacon Hill (the hill, not the neighborhood) and the Charles River—which could limit how far pedestrians are able to walk to reach Charles MGH. Kendall Square, which has the fourth lowest mean walking distance out of all Red Line stations, also is adjacent to the Charles River, which provides further evidence for bodies of water like the Charles River affecting station walkability. A similar effect can be seen at Assembly Station on the Orange Line, which is surrounded by the Mystic River and Interstate 93, and has a lower average and median walkshed than the adjacent stations (Sullivan Square and Malden).

Last Stations on Subway Lines

Stations at the ends of the Red and Orange Lines—Alewife, Davis, Forest Hills, Malden, and Oak Grove—tend to have larger average and median walk distances. This is probably because passengers who live beyond the reach of the Red and Orange Lines prefer the Red and Orange Lines to alternative MBTA services (the bus network and the commuter rail), and are willing to walk further distances to reach the Red and Orange Lines. 


The stations at the center of the Orange Line—beginning at around Mass Ave and ending at around North Station—tend to have lower medians and means compared to other Orange Line stations. There are a few possible explanations for this. First, this section of the Orange Line is not only very close to the E branch of the Green line, but also runs parallel to it. This means that passengers can choose between the E branch and the Orange line, and it’s likely that one of the factors that goes into that decision is which line has the closest stations, so passengers are likely minimizing their walking distances during that section of the Orange Line. Another factor is that Orange Line stations in the center of the Orange Line are particularly close together, which could affect how far passengers need to walk to reach an Orange Line station.