Behind the scenes with MBTA data.

Why Evaluate MBTA Coverage?

A key component of transit service planning is offering service to the largest number of people possible. Understanding how much of the population the MBTA currently covers, and where that population is located, is important to understanding how well the T is serving its constituents and where the MBTA should expand or modify its service. In 2017 the MBTA set coverage standards as part of its Service Delivery Policy

One application of the coverage evaluation is the Better Bus Project, an ongoing initiative to improve bus service. As the MBTA focuses on bus planning, it is important to be able to evaluate the coverage impact of proposed changes to bus stops or routes. Automating the process of evaluating coverage allows for frequency and consistency in the evaluation process, so that whenever changes are proposed, the T can quickly assess their coverage impact and compare their impact to other proposed changes. 

Coverage Automation Tool Overview

To automate the coverage evaluation process, we (the Office of Performance Management and Innovation at the T) created a coverage evaluation tool using ArcGIS ModelBuilder. This tool uses census population data and location data for transit stops to compute the population of the area within walking distance of MBTA service. The model further calculates the percentage of the population covered by MBTA service in MBTA cities and towns by dividing the population walking distance to transit stops by the total population within the cities and towns.  This post will walk you through the MBTA’s 2017 base coverage analysis. Base coverage is the percent of the total population within the MBTA cities and towns living .5 miles walking distance away from any MBTA operated or subsidized transit stop or station, regardless of the frequency or span of service provided.

Data Inputs for 2017 Base Coverage Analysis

Our first step to evaluating base coverage was downloading the most reliable data available for our analysis. This data includes:

  • All MBTA stops in the fall of 2017, downloaded as text files from GTFS
  • Route data (shapefiles) for MBTA privately operated/subsidized routes, for example, the Lexpress
  • American Community Survey Total Population 2016 5-Year Total Population Estimates downloaded from American FactFinder
  • TIGER census block groups for the seven counties served by MBTA service
  • MBTA Towns from MassGIS
  • The area of water in Massachusetts
  • MassDOT Road Inventory 2017 Road Network


This analysis consisted of three components. We automated the process of finding:

  1. The area and population of all block groups within MBTA Cities & Towns
  2. The area within walking distance from MBTA transit stops and stations
  3. The population living a .5 mile walking distance from MBTA transit stops and stations, calculated as the percentage of the block group population within the walkshed, assuming that the population is evenly distributed. 

Step 1: Finding the Area and Total Population of all MBTA Cities and Towns at Block Group Level

As block group population data comes as a spreadsheet from the American Community Survey, we first transformed block group population data into a spatial dataset. To do this, we joined ACS block group population data with TIGER block group geography. As all block group data is at the census level, we then clipped block group polygons to the shape of MBTA cities and towns. 

This step resulted in tiny slivers, as the block group polygon boundaries do not precisely overlap with the MBTA cities and town boundaries. To delete these slivers, we created a buffer .05 miles around the boundary of the MBTA cities and towns polygon and deleted all census block polygons located completely within this buffer. No real census block group is that small, so we knew all block groups deleted in this process were slivers. 

After spatially displaying the population data at block group level, we then calculated the area of each block group in square miles. To get a better estimate of the area where people actually live, we erased water features first, then calculated the area. 

Step 2: Finding the Area Within Walking Distance from MBTA Transit Stops and Stations

To calculate the base coverage area, first we downloaded all GTFS MBTA transit stops for the fall of 2017 as a text file. Next, we converted the stops from the text file into points using ESRI’s Display GTFS Stops tool. GTFS stops include all MBTA operated bus routes, but do not include flag stops along privately operated routes subsidized by the MBTA, like the Lexpress bus in Lexington. We estimated the location of these stops to be at all road intersections along the subsidized routes. To do this, we first created a network dataset using the MassDOT Road Inventory 2017 file. The resulting network dataset included a streets layer and a road junctions layer. We estimated the location of flag stops by selecting the road junctions 50 feet or less from the subsidized service routes. Our final stop file for base coverage included the GTFS Stops merged with the selected road junctions stops. 

We used ArcGIS Network Analyst to calculate the area a .5 mile walking distance along Massachusetts roads from all MBTA stops and stations. We used the Road Inventory network dataset mentioned previously as the network dataset for the analysis and input the stops as facilities into network analyst. Our network analysis resulted in a layer of dissolved polygons around every MBTA stop or station. This is the MBTA 2017 coverage area.  

Step 3: Finding the Population Living within the MBTA Base Coverage Area 

To find the total population in our coverage area, we clipped the census block polygons with the population and area attribute created in step 1 to the coverage area created in step 2. We then recalculated the area of the census block polygons after they were clipped to get the area of the census block polygons covered by service. We then found the percentage of the area of each census block group covered by our coverage area by dividing the area covered by service by the total area of each block group. Assuming even distribution, we calculated the percentage of the population we covered in each block group as a measure of the area we cover, divided by the total area multiplied by the total population of the census block group. To find the total population in our coverage area, we summed up the total population covered in all census block groups. The final coverage percentage was calculated as the coverage population divided by the total service area population. 

The Base Coverage Model

The model to automate the coverage analysis is shown below. (Click to enlarge)

Flow Chart from ArcMap ModelBuilder showing the coverage model

Map of MBTA Base Coverage

Map of MBTA base coverage

As seen in the base coverage map, we found the coverage area using dissolved walkshed polygons. The polygons are jagged due to the location of walkable roads near transit stops. The area covered by MBTA service is efficiently located over the highest density parts of the service area, so though service only covers around half of the service area, it covers around 80% of the total population. 

Conclusion: Importance in Service Planning

As the MBTA strives to improve the service provided to its constituents living within its service area, the coverage tool can evaluate how proposed bus stop and route changes will affect the number of people receiving service. Further, the T can use different inputs into the coverage metric to understand how it is performing on different types of coverage. For example, we can use stops receiving high frequency service to see what percentage of our population receives frequent service, or we can look at vulnerable populations, instead of total populations, to see the number of vulnerable people covered by MBTA service. Related to the Better Bus Project, the T can see the percentage of the population covered by varying levels of bus service to see what populations receive different types of service. The coverage tool is flexible, quick and more reliable than conducting manual analyses, and will allow the T to continue to evaluate the quality and impact of its service improvements.   

The Federal Transit Administration (FTA) Title VI Circular (C 4702.1B) requires large transit providers to collect demographic, travel, and fare payment data about their riders using passenger surveys at least every five years. The MBTA, working with the Central Transportation Planning Staff, has just completed a systemwide passenger survey to collect necessary passenger demographic data for bus routes and rail stations. This project updates the 2008-2009 dataset and will be used for service planning, ridership analysis, and Title VI equity analyses.   

The MBTA knows this data is useful for many other research projects, so we are releasing an interactive tool that allows you to compare the results for stations and bus routes. You can also download the associated datasets. 

Screenshot from the Rider Census online tool

Survey Methodology

The survey responses were obtained through a combination of an online form that was available from late October 2015 to May 2017 and a paper form with mail-in option, distributed at MBTA stations and on board MBTA vehicles from March 2016 to March 2017. Approximately half of all completed forms were submitted by each method. The English version is shown below.

The survey plan called for obtaining responses at the route level for bus and ferry routes and at the station or line segment level for all other modes. A goal was to obtain at minimum enough responses from each route, station, or line segment to meet statistical requirements for a confidence level of 90 percent with a confidence interval of 10 percent. In cases where the number of responses was insufficient to meet these standards, results from two or more routes, stations, or segments serving the same general area were combined. 

To compensate for differences in response rates when comparing results from different lines or modes, the published results for each route, station, or segment are weighted in proportion to typical weekday total passenger boardings on the corresponding services based on recent count data. 

More detailed methodology can be found in the survey report, available soon.

Data Considerations

Please consider the following in working with these data:

90/10 confidence and precision: Below the 90/10 level, data are not displayed (the tool shows “Insufficient Data”). The displayed routes and stations all meet at least this level of confidence and precision, but some services are near this threshold, and some are much more precise. The additional precision mostly comes from higher samples on high ridership routes and stations. In addition, we assumed the “worst case” of evenly split characteristics in order to evaluate the confidence and precision levels. Since some characteristics are not expected to be split evenly among riders, even data included at the 90/10 levels is likely to actually be more reliable than 90/10.  

However, because the confidence and precision levels can vary, it is important to take into account the possibly-wide interval range when comparing routes or stations to each other. Conclusions about differences among services are likely to be more reliable for higher-ridership routes and stations, or at aggregations of services (e.g. at the mode-level). In order to assist with this evaluation, the valid sample counts for each question are provided along with the weighted response data in the downloadable datasets. 

Check all that apply: Questions that allowed respondents to check multiple responses will have answer options that total more than 100 percent. These questions are:

  • Do you sometimes make this trip another way? and
  • How do you self-identify by race?

Trip-specific information: Some questions have wording that is trip-specific and cannot be generalized to MBTA use overall, including:

  • “Fare payment” applies to the reported trip, not to fare payments overall
  • “Trip frequency” applies to the reported trip, not to the frequency of riding the MBTA
  • “Alternative modes” refers to alternative modes for the reported trip, not for alternative modes to the MBTA in general 

Survey response bias: Some groups of people are more likely to respond to surveys than others. Disparities in results for these groups suggest a disparity in the response rates between the groups rather than such a large difference in the actual ridership population. Specifically, the gender disparities and English-speaking ability disparities are likely effects of response bias (women and English-speakers are more likely to respond to surveys) and not necessarily representative of the population. For these demographic elements, the reported values are likely to be biased, but the trends are likely reliable (i.e. a bus route with more women than another bus route likely does have more women, but the percentage of women on both bus routes is likely over-estimated). 

Additionally, we believe that visitors to the region and the MBTA are less likely to fill out and return surveys than regular riders. This response bias reveals itself most in the fare payment data – the portion of survey respondents who reported using monthly or seven-day passes is higher than the portion recorded by our fare system paying with these passes. This and other biases may show up in other results as well.

English Proficiency: The “Ability to Understand English” results cannot be assumed to provide an accurate measure of the percent of MBTA riders with little or no English proficiency because 99 percent of the returned survey forms used the English version, and forms were available in a limited number of other languages (Spanish, Portuguese, Cape Verde Creole, traditional and simplified Chinese, Vietnamese, French).

The MBTA, working with the Central Transportation Planning Staff, has just completed a systemwide passenger survey to collect necessary passenger demographic data for bus routes and rail stations. This project updates the 2008-2009 dataset and will be used for service planning, ridership analysis, and Title VI equity analyses.   

The MBTA knows this data is useful for many other research projects, so we are releasing an interactive tool that allows you to compare the results for stations and bus routes. 

In collaboration with the Boston Area Research Initiative, the MBTA is holding a data challenge to see how students and researchers can creatively use the survey data to answer research questions. The winners of the data challenge will be invited to present their work at the BARI Spring 2018 conference on April 27th, 2018. 

Screenshot from the Rider Census tool

Data Challenge Logistics

The first rule is read all the data caveats! After that you are free to do whatever analysis interests you. To get you started we have created a list of potential research questions (below). Feel free to combine this data with other datasets about Boston.  

You may work on your submission as individuals or teams. Submissions are due at midnight at the end of April 16th, 2018. Please e-mail them along with your contact information to This email address is being protected from spambots. You need JavaScript enabled to view it.. You may also contact us at this address with data questions you have as you work on the challenge.

Winners will be notified on April 20th, 2018 and invited to attend the BARI Spring 2018 conference and present their results. Winning submissions will also be featured on this very prestigious data blog.

Your submission can be a map, written analysis, an interactive tool, or whatever you think best conveys the analysis you did. 

Data Challenge Criteria

Submissions will be judged on the following criteria:

  • Accuracy of the analysis: Did you use the data correctly? Were your analyses methodologically sound and well-documented? Did you account for the caveats?
  • How compelling the research question is: Does the analysis reveal something that was not apparent at first glance? Does it confirm something we believed but weren’t sure about? The results do not have to be surprising to be compelling.
  • Presentation of the analysis: Is the deliverable easy to understand? Are the graphics clear? Are the graphics and tables helpful in understanding the results?

Potential Research Questions

To get you started, one of our interns did an analysis of how the minority usage on our bus routes compares to demographics of the tracts the route passes through.

Other ideas:

  • Where do MBTA rider demographics match (or not match) resident populations? Extend the above analysis to other demographics or look at it another way.
  • Which route/station has the most representative demographics of Boston? (It’s up to you how you want to define Boston geographically and how you want to define representative.)
  • Do the access modes to the stations reflect land use around those stations? Also think about parking availability and transfers (available in the dataset).
  • How does household vehicle availability match (or not) with usage? Are there spatial or demographic explanations for any mismatch?

Links to other datasets that might be useful

BARI data portal

Hubway data 

City of Boston open data portal






MAPC Vehicle Census