Behind the scenes with MBTA data.

How will the upcoming fare collection system, AFC 2.0, affect the cash/ticket surcharge?

Wollaston Station is located on the Braintree branch, wedged between North Quincy (to the north) and Quincy Center (to the south). During the month of October 2017, an average of 3,489 people tapped into Wollaston Station every weekday. A couple of months later, in the spring of 2018, the MBTA temporarily closed Wollaston Station for construction. Where did all the old Wollaston riders go?

How our riders changed their behavior matters to us for two main reasons. From the data perspective, we need to know the closure affected our riders in order to update the estimates of passengers in our origin-destination matrix. If riders switched to other stations, we need to adjust the passenger rates accordingly to maintain the most accurate weighting for our reliability metrics. For more detail on how these metrics are calculated, check out this blog post. From the service provider perspective, we know station closures are hard on our riders, particularly those who use the MBTA for regular commutes. Understanding how this particular construction affects our riders and our ridership can be used to improve station closures in the future. One example is that identifying patterns of ridership change enables us to better estimate demand for replacement services like shuttle buses. 

For this analysis, we identified a core group of riders who frequently used Wollaston Station in October of 2017. We then captured how their travel patterns may have changed in April, after the station had closed. Finally, we scaled our results up to the total Wollaston ridership to get estimates for the raw number of riders for our metrics.  

The hypothesis: 

While Wollaston is closed, shuttle service is being provided to connect the station with its neighbors on the Red Line (North Quincy and Quincy Center). Riders who would usually use Wollaston may have switched to a different mode of public transit like bus, started using an alternative station nearby (whether going there directly or taking the shuttle), or simply stopped making trips on the MBTA altogether.  Given our interest in updating passenger rates at stations and in knowing how the disruption affected travel demand, we focused on identifying how many of our riders switched to other rapid transit stations, and how many we no longer see using the system. 

The most comprehensive data we have available are our riders’ CharlieCards. We used the anonymous unique identification numbers to see in aggregate who used Wollaston station when it was open, and what they chose to do after it closed. 

Setting the scene: 

In October of 2017, 14,605 unique CharlieCards tapped into the station. Analyzing the behavior of infrequent riders presented a myriad of issues because our data source does not allow us to capture intention. We could not determine if someone who boarded at another station would have used these stations normally, or if their behavior was because Wollaston Station closed. Instead, we strategically selected a group of “frequent Wollaston riders,” who regularly boarded at Wollaston Station before its closure. These frequent Wollaston riders did have identifiable travel patterns, and we could capture more confidently how these travel patterns changed. 

There are no formal definitions for what it means to be a frequent rider with the MBTA, which gave us the flexibility to make our own data informed cut off. Wollaston Station is surrounded by urbanized neighborhoods and has a parking garage with around 500 spaces. This suggests that the station is predominately used by locals for their daily commute rather than one time ridership we see at Airport or other stations with major tourist attractions. Using data from October 2017, we mapped out patterns in rider behavior. For cards that tapped into Wollaston Station in October, 12% tapped into Wollaston Station only once. Our threshold for frequent riders needed to be high enough for there to be a recognizable pattern of travel behavior, yet still have a substantial number of riders included.  We set our cut off to be riders who tapped into Wollaston Station at least four times a week or at least 12 times in a month, honing in on riders who likely commute to work or school at least four of the five work week days. Eighteen percent of tickets and CharlieCards that tapped into Wollaston Station in October 2017 were classified as “frequent” under this definition. These 18% of cards comprised over 60% of all the taps, confirming our hypothesis about the prominence of commuting. 

Given the static nature of the origin of commuters (it seems unlikely that anybody moved homes due to a temporary station closure), we figured that closing Wollaston would push riders to switch to the neighboring stations: North Quincy and Quincy Center. The MBTA does not require riders to tap their card on the shuttle system that currently runs between the stations, which means we weren’t able to tell if someone boarded the shuttle at Wollaston or accessed North Quincy or Quincy Center directly.  

The core of the analysis was thus to identify the cohort of “frequent riders” who used Wollaston Station when it was open in October 2017, and track in aggregate which station these riders turned to in April 2018, after the closure. 

The control: 

We needed a control station to establish an estimated CharlieCard churn rate at the station level. Riders regularly lose their cards, get new cards, or use multiple cards. Even though we may “lose” a portion of CharlieCards in our analysis, the riders themselves may still be making the same commute under a different card number. We expected to lose track of a portion of the frequent riders, even though we weren’t losing them from our system and their travel patterns weren’t changing. With no control, we would have dramatically overestimated the number of riders who left our system entirely. 

We initially selected Quincy Adams as our control station. We identified a cohort of frequent riders in October of 2017 using the same definition (CharlieCards that tapped into the station at least four times per week) and we counted how many of those same cards appeared as frequent riders in April of 2018. Sixty eight percent of the October frequent riders were still frequent riders in April, suggesting a natural churn rate over this time of 34%. One of our major assumptions in this analysis is that the travel patterns of these “churn” riders is not different than those who kept their same CharlieCard for the time frame. When we ran the same analysis on Andrew, we discovered the churn rate at Andrew was significantly higher, nearly 40%. 

The variation in churn rates in the control stations suggested that the churn is affected by geographical factors or other station specific characteristics. After running several more parallel analyses, we decided that using the churn rate from Wollaston Station for the year before would be the most unbiased control. To avoid possible effects of seasonality (perhaps the nicer weather in October draws more people to use alternative modes), our baseline churn rate was calculated as the churn at Wollaston Station between October 2016 and April 2017. This churn rate, 32.5%, became our control; 32% of CharlieCards that tapped into Wollaston in October did not tap into Wollaston in April. Our two assumptions are that the churn rate is the same the following year, and that the riders who are part of this churn do not behave differently than their counterparts. That is to say, any behavior patterns we see in riders who do keep their card can be extrapolated onto the group of riders who don’t.

Station Churn Rate*
Wollaston: Oct 16 - April 17 32.5%
Braintree: Oct 17 - Apr 18 32.8%
Quincy Adams: Oct 17 - Apr 18 34.0%
Andrew: Oct 17 - Apr 18 39.9%
Broadway: Oct 17 - Apr 18 34.6%

*percentage of frequent riders in October who do not appear as frequent riders the following April

The analysis: 

The end analysis boils down to fractions with carefully defined numerators and denominators. Let’s start with defining the numerators.


  Became Frequent Riders at:
North Quincy (NQ) 36.6%
Quincy Center (QC) 7.0%
Both Stations 0.1%
Total 43.7%

The numerators are the percentages of our previously identified frequent riders before the closure that became frequent riders at the respective stations after the closure. Of the 100% of frequent riders at Wollaston Station in October who appeared as such, we estimate that 43.7% switched their travel patterns and began using one of the neighboring stations as a replacement for these trips. 36.6% of these frequent riders switched to North Quincy (NQ), and 7% of the frequent riders switched to Quincy Center (QC). A quick check confirmed that there were no riders who were frequent riders at Wollaston Station and either of the new stations (NQ and QC) in October; thus we can confidently say that this cohort of riders shifted their behavior from frequently using Wollaston station to frequently using either North Quincy or Quincy Center. 


Of the 100% of Frequent Riders
Became frequent riders at other stations 43.7%
Became infrequent riders at other stations 14.0%
Control CharlieCard churn rate 32.5%
Unaccounted for -- likely chose another mode or didn't take trip 9.8%

The dominator is the pool of people we believe are reacting to the station closure. This includes people who switched stations as well as people whose card data we no longer saw. To derive the number of people we didn’t see, we returned to the same cohort of frequent riders from October. We already calculated that 43.7% of riders switched to neighboring stations, and from our control group, we have an estimated CharlieCard churn rate of 32.5%. We also estimate that 14% of riders became infrequent due to extraneous reasons. This rate was calculated using our control group of Wollaston Station the year before, and confirmed to be relatively standard, as the average drop to infrequency in the other four stations we tested came out to be 17%. These are people who partway through the month could have changed CharlieCards or commutes. That leaves 9.8% of our frequent riders in October 2017 unaccounted for. We believe these 9.8% of riders are the people who changed modes because of the station closure. Thus, the total percentage of frequent riders who changed their behavior in response to the closure, either leaving or switching to neighboring stations is the summation of 43.7% and 9.8%, or 53.4%.  

Scale up: 

In order to scale our findings up to the entire population of Wollaston riders, we have to make two significant assumptions. The first is that the same patterns of behavior from these frequent riders apply to the frequent riders who were part of the churn rate. This is not too hard to believe, at least when looking at patterns in aggregate estimations. The second assumption is that infrequent riders would exhibit similar travel patterns in response to station closure as frequent riders. Although this assumption is arguably weak, from a data perspective we have some wiggle room. Since frequent riders, while small in number, make up the vast majority of taps, the lower levels of precision are unlikely to affect our results significantly. We ran a couple of additional analyses to ensure that riders who were counted as “frequent riders” at Wollaston station in October of 2017 were not also “frequent riders” at North Quincy or Quincy Center during that same month. (They weren’t.)


  NQ QC Both Not Seen
Numerator 36.6% 7.0% 0.1% 9.8%
Denominator 53.4% 53.4% 53.4% 53.4%
% of riders 68.52% 13.07% 0.14% 18.26%

So, where did all our Wollaston riders go? We believe 68% started using North Quincy, 13% started using Quincy Center, and 18% left to other modes, including other MBTA services. The difference between North Quincy and Quincy Center uptake makes sense given their geographical contexts because North Quincy is both closer to downtown and closer to Wollaston Station. If you haven't been to these stations, you should jump on the Red Line and head out to Quincy to explore! Hopefully 68.52% of you will hop off at North Quincy!

The MBTA is constantly working to improve its data quality, especially the data generated by our train tracking systems that affect our customer-facing feeds. Better data quality means more accurate real-time customer information and measurements of our performance that more accurately reflect passenger experiences. But this means that there will be discontinuities in our performance measures based on improvements to the underlying data, rather than changes in performance. This post explains changes made on September 12, 2018 to subway data that impact our performance measures.

In previous posts, we’ve explained how the MBTA tracks vehicles both in general and on the Green Line. We have vehicle tracking systems on almost all our vehicles, with different tracking systems for the different modes (heavy rail, light rail, bus, and commuter rail). These vehicle tracking systems produce real-time data feeds (some built by vendors, some built in-house) that are used to manage our service, measure our performance, view vehicle locations in real-time, and provide passengers with predictions of upcoming vehicle arrivals. We use a data fusion engine to combine the data feeds from each of these systems into one consolidated real-time feed to make it easier for our developers to work with our data. This consolidated feed is also the source of data for our performance tracking system that provides the data published on the MBTA Back on Track dashboard.

The existing software that produced the real-time data feed for heavy and light rail vehicles was a legacy codebase that was built in-house. It was functional for the basic application of providing subway predictions, but design decisions made during the initial development made it difficult or impossible to add new features or improve existing data quality. We have been working to replace the software in order to add new features and improve the accuracy of our locations and predictions. We went live with the Green Line portion of the update on February 8, 2018 and went live with the software for heavy rail (Red, Orange, and Blue lines) on September 12, 2018.

Some of the new and improved features include:

  • Inclusion of location information for trains at terminal stations
  • The flexibility to handle different types of shuttle-bus diversions, including ones that are created on-the-fly in response to incidents
  • Improvements to the accuracy of predictions for trains that are at terminal stations
  • General improvements to the accuracy of locations and predictions throughout the lines

Our previous heavy rail data feed did not include location information for trains at the terminal stations, and the passenger-weighted metrics did not take into account the passengers who were traveling to or from the end of the line. With the inclusion of location information of trains at terminal stations for heavy rail, we now have accurate arrival times at terminal stations and can include these passengers in our metrics. Passenger weighted reliability metrics for Red, Orange, and Blue lines will more accurately reflect the customer experience. This will result in a decrease in the reliability metrics for the heavy rail lines between 0-2% depending on the line and the day.

In addition to the new data feed, we have built a new data fusion engine called Concentrate to combine the new real-time data feed for heavy rail and light rail with the feeds for commuter rail and bus into one consolidated feed for all modes. Concentrate enables higher-capacity, more frequent sharing of all MBTA real-time data. Concentrate went live for providing real-time information to third-party developers and customers in March 2018. We have been rolling it out for use as a source of data for internal systems over the last few months. We began using the data from Concentrate for the performance tracking system on September 12, 2018. It improves the update frequency of real-time information by up to 30 seconds in some cases and results in more accurate arrival and departure times throughout the lines. This was especially important for the Green Line where there are many stations that are close together and trains arrive frequently where even a few seconds delay replicated over the course of the day could result in many missing events.

Missing events create more problems for the Green Line because we are not currently able to identify them and remove false long wait times on the Green Line (as we do for the Red, Orange, and Blue lines) due to complexities with the Green line schedule and other data limitations (described more here). Therefore, improving the accuracy of stop events (arrivals and departures) for the Green Line is very important in improving the accuracy of our passenger wait time metric. With Concentrate, passenger weighted reliability metrics for the Green line will more accurately reflect the customer experience. This will result in an increase in the reliability metrics for the Green line between 2-7% depending on the branch and the day.

We will have to take these methodological changes into consideration when we are looking at heavy rail and light rail performance trends over time so that we can accurately attribute when increases in the wait-time measure are due to data improvements and when they are due to service improvements.