Behind the scenes with MBTA data.

How to measure equity on high ridership bus routes

Anna is an M.A. candidate in Tufts’ Urban and Environmental Policy and Planning program and is one of OPMI’s interns for this semester. The following is a post she wrote about a class GIS project she did using recently-available MBTA Rider Census data. We’re currently holding a Data Challenge using this data – see this post for more details.

The MBTA follows Title VI of the 1964 Civil Rights Act, which protects people from discrimination based on race, color, or national origin. This means that when the MBTA considers service improvements and changes in service, it must make sure that these changes do not have disparate impact on minority populations. In order to evaluate service changes for impacts on minority populations, the MBTA must understand the proportion of riders who are minority on all of its services. 

There are currently two main ways to calculate minority ridership. The MBTA could either conduct a rider census survey to collect demographic information from a sample of riders on each MBTA bus route or station, or could use U.S. census data for people within the service areas of that service to estimate minority ridership. Conducting a rider census survey is the preferred method; however, this method is resource and time intensive. U.S. census data is easily accessible and easily analyzed so this method, if representative, would be useful. 

In this project, I sought to understand if U.S. census minority data within MBTA bus service areas was representative of the minority bus ridership on MBTA buses. Essentially this is a question of whether the minority make-up of the people on bus routes matches the minority make-up of the neighborhoods the routes travel through.  

The MBTA recently conducted a Rider Census (the results are available here), so I compared the rider census survey data to U.S. census minority population data in MBTA bus service areas.

How did I analyze the difference between the minority population riding the bus and living in bus service areas?

I examined all MBTA bus routes with a daily ridership of over 600 passengers, excluding the Silver Line. The analysis used MBTA Rider Census data, MassGIS bus stop and bus route data and 2010 U.S. census block and population data. 

In order to find service areas for the key bus routes, I conducted a network analysis on the key routes’ bus stops using Massachusetts streets. I then selected the census blocks intersecting the service area polygons generated from the network analysis. 

For all non-key bus routes with daily ridership over 600 passengers, I conducted a simpler analysis to find the service area. For these 121 routes, the census blocks within a quarter-mile Euclidian radius of the bus routes’ bus stops were selected to create the routes’ service areas. I chose this simpler analysis for these routes because this analysis was faster to conduct on a large number of bus routes. Network Analyst more accurately represents walking distances; however, the simpler analysis results varied little from the network analyst results, so I chose to use the more accurate network analyst method for key bus routes only. 

After finding the service areas for key routes and for all bus routes, I calculated the average U.S. Census minority percentage living in each route’s service area. I then compared the U.S. Census minority percentage to the MBTA rider census minority percentage for each of these routes. For each bus route, I created a value by dividing the MBTA rider minority percentage by the U.S. Census minority percentage to show the rider census minority population as a percentage of the U.S. Census minority population living within the route service area. This value shows the difference between the percentage of riders who are minority and the percentage of the population living within each bus service area who is minority. I then mapped the difference values for each bus route, several individual high ridership bus routes and the bus routes with the highest difference values. 

So, can U.S. Census data predict minority bus ridership?

The study found that the percentage minority riding MBTA buses is higher than the percentage minority living in MBTA bus service areas; t(115) = 12.38, p=.00.  On 86% of bus routes, the minority percentage riding the bus was at least 115% of the minority percentage living in the service areas (Figure 1). Figure 2 below shows the 7% of MBTA bus routes in blue where the minority population percentage living in the bus service area is comparable to the minority population percentage taking the bus. 

Figure 1. Pie chart demonstrating the percentage of bus routes with minority bus ridership higher, lower and similar to the minority population living within bus service areas. 

Figure 2: This map shows the MBTA bus routes in blue the eight bus routes where the percentage of minority bus riders is comparable (between 86% and 115%) to the minority population percentage living in the bus service area. 

For example, on Route 1, the MBTA bus route with the fourth-highest weekday ridership, 36.7% of bus riders are minority; however, only 28.4% of people living in the Route 1 service area are minority. Figure 3 below shows the minority population percentage living within the service area census blocks adjacent to route 1 bus stops. 

This analysis not only demonstrates that minorities disproportionately ride the bus compared with the non-minority population, but also demonstrates where the difference between minority bus ridership and minority residents living in bus service areas is the highest. Figure 4 shows a map of routes where the minority bus ridership is greater (between 100% and 360%) than the minority population living in the bus service area. 

Figure 4. This figure shows the bus routes with a percentage minority ridership higher than the percentage minority residents in the service area. Darker orange bus routes have a higher difference.

The routes with the highest difference between minority bus ridership and minority population in the service areas are bus routes 350, 134, 76, 230 and 93. On these bus routes, the percentage minority of bus riders is more than three times the percentage of minority residents living in the service areas. Route 93 has  the highest difference (Figure 5). 30.3% of bus riders on bus 93 are minority but only 9.9% of the population in the bus service area is minority. Route 93 runs through downtown between Haymarket and Sullivan Square. 

Figure 5. Route 93 bus route and minority population distribution within its service area.

In Conclusion…

Based on this analysis, the minority population living within a bus route’s service area is not representative of minority ridership on buses travelling along the route, so the MBTA should continue to use rider census data, instead of U.S. Census data, to estimate minority bus ridership. 

This study’s methodology could be improved by automating the network analyst tool for creating bus service areas. This way, the analysis could be carried out quickly on all bus routes, not only high ridership routes. Also, further analysis could explore alternative ways to select census block data within a .25 mile walkshed of a bus stop. The current analysis selects full census blocks with parts within the .25 mile walkshed, in an effort to avoid assuming even distribution of the minority population. Future analysis could examine the distribution of minority populations within census blocks to improve accuracy.

Future studies may also generate a way to predict minority bus ridership using U.S. Census minority percentages, by weighting each U.S. census block minority percentage based on the number of boardings at the stop within the block.  A future tool for predicting minority ridership based on U.S. Census minority population data could also identify and use other external factors that may affect minority ridership as well when generating a more accurate prediction. 

Finally, this study gives the MBTA a general understanding of the minority bus usage in non-minority residential corridors. This could help the MBTA make service improvements to improve access for these riders. 

How one-time events like parades and the calendar influences ridership numbers

The MBTA is constantly working to improve its data quality, especially the data generated by our train tracking systems. Better data quality means more accurate customer information and measurements of our performance that more accurately reflect passenger experiences.  But this means that there will be discontinuities in our performance measures based on improvements to the underlying data, not based on changes in performance. This post explains a change we just made to Green Line data that impacts our performance measures.

Our data comes from many underlying systems and as we upgrade those systems, we are able to release new data feeds. In previous posts, we’ve explained how the MBTA tracks vehicles both in general and on the Green Line. The existing software is a legacy codebase that was built in-house. It is functional for the basic application of providing subway predictions, but design decisions made during the initial development have made it difficult or impossible to add new features or improve existing data quality.

We are in the process of replacing the software in order to add new features and improve the accuracy of our predictions. We went live with the Green Line portion of the update on February 8, and are continuing to work on updating the software for heavy rail (Red, Orange and Blue lines).

Some of the new and improved features include:

* The flexibility to handle different types of shuttle-bus diversions, including ones that are created on-the-fly in response to incidents

* Improvements to the accuracy of our predictions for trains that are at terminal stations

* General improvements to the accuracy of predictions throughout the lines

We began this project last year when we wrote a new application to output real-time data for the Mattapan Trolley in order to add countdown signs and provide locations and predictions data for the Mattapan Trolley to app developers. Our next step was to expand this application to cover the Green Line as well.

As we prepared for the launch of this new source of Green Line data, we discovered a bug in the previous version of the software. This bug did not affect the location and prediction information going out to customers about Green Line trains, but it did mean that frequently, departures and arrivals of trains at terminal stations were not recorded by our performance system. Because of this, we found many erroneous “long wait times” for passengers traveling to or from terminal stations. The new software significantly improves the accuracy of the arrivals at and departures from terminal stations in the performance tracking system.

As described in the blog post ‘February Green Line Reliability Data’, we are not able to identify missing events and remove false long wait times on the Green Line as we do for the Red, Orange, and Blue lines due to complexities with the Green line schedule and other data limitations. Therefore, improving the accuracy of stop events (arrivals and departures) for the Green Line is very important in improving the accuracy of our passenger wait time metric. We are also continuing to improve our performance tracking system to better differentiate between long wait times that are real and those that are due to missing data caused by bad GPS reads and the data that is missed during the processing time polling cycles between systems.

We deployed this new software on February 8, 2018, which means that the reported percentage of customers experiencing wait times longer than a scheduled headway for the Green Line for dates starting February 8, 2018 will more accurately reflect the customer experience. This will result in an increase in the reliability metrics for the Green Line between 1-6% depending on the branch and the day. 

We will have to take this methodological change into consideration when we are looking at Green Line performance trends over time so that we can accurately attribute when increases in the wait-time measure are due to data improvements and when they are due to service improvements.