Behind the scenes with MBTA data.

A survey of overnight bus service using customer behavior and preferences.


In order to better understand the need for overnight transit service, the Office of Performance Management and Innovation, in collaboration with municipal and advocacy partners, surveyed potential passengers regarding their current overnight travel patterns and their preferences for overnight service.  

The survey supplements existing data by providing insight into potential passengers’ behaviors and preferences that could not be analyzed with our other data sources, such as their purpose of travel in the 1 a.m. to 5 a.m. time period, their likelihood and projected frequency of use of possible overnight service, and what service levels would have to be provided in order for potential passengers to actually use overnight buses. 

The MBTA is releasing the cleaned data from the survey, along with our initial analysis, in order to inform the conversation about overnight service. However, there are some caveats to understand about how the survey was conducted and how we cleaned the data. Several categories in the data were not included as options in the initial survey, but were added to the clean dataset when many respondents had submitted similar answers of interest. In addition, because survey respondents were not randomly selected and we did not receive demographically balanced responses, their behaviors and preferences are not generalizable to the entire MBTA service area and cannot be used to estimate ridership demand. 

Data Collection

The survey was conducted online during November and December of 2016 in eight different languages. Outreach was done online through Twitter and other forms of social media. In addition, in order to reach respondents who would not have been captured by the social media outreach, city partners advertised the survey in city halls, among employers (for distribution to their employees), and by providing workstations with the online survey in other locations. Most responses were collected via the online instrument, but a few came in through paper surveys. 

Data Cleaning and Categorization

In order to check that spam responses were not included, IP addresses associated with over 50 responses were investigated by verifying that the attrition rates (the percentages of respondents who began the survey but did not finish it) associated with these IP addresses were all similar to the overall attrition rate, and that there were responses with overlapping timings (i.e. some responses were begun while others were in progress). A single spam respondent was removed for having checked “other” and entering randomly typed letters whenever possible. 

The overnight bus survey contained three main sections: a first section containing questions about the respondents’ current late night travel patterns, a second section containing questions about how respondents would use an overnight bus service, and a third section containing demographic questions. All respondents who did not answer any questions after the first block were removed from the dataset. This was determined to be the most appropriate bright-line for inclusion because there was a sharp drop in attrition between questions after this point in the survey. This process left 7,282 remaining usable responses in the dataset.

The data was then cleaned to reclassify respondents who had checked “Other” and then provided more information whenever possible. In many situations, respondents selected “Other” but then provided a specification that fit into one of the preexisting options in the survey. For example, respondents who specified that their late night trips begin “At a friend’s house” were appropriately reassigned to be “At a social or recreational activity” and respondents who mentioned taking trips by Uber or Lyft were reassigned as having chosen to “Take a taxi or use a rideshare company.” 

Several new categories were also created when many respondents had all provided similar explanations. 

  • For the questions regarding where respondents trips usually begin or end, a new category called “Airport” was created for respondents who specified that they were traveling to or from the airport. Additionally, respondents who wrote that they regularly took trips in two or more of the given categories were placed into one of two new categories – “Multiple (Including Work)” or “Multiple (Not Including Work)” – depending on whether or not one of these purposes was for work.
  • For the question regarding what type of transportation the respondent currently uses to travel between 1 a.m. and 5 a.m., a new category called “MBTA” was created for respondents who specifically mentioned taking a bus or train that is still running, for example by writing “leave early enough to catch last 89.” In addition a new category called “Overnight” was created for respondents who specifically mentioned that they used to take this trip using the MBTA, such as by writing “I used to take the T before they shut down late night service.”
  • For the question regarding what industry the respondent has to travel for between 1 a.m. and 5 a.m., a new category called “Entertainment” was created for anyone who specified that they were involved in theatre or another live entertainment event, and a new category called “Security” was created for anyone involved in public or private security.
  • For the question regarding optimal boarding time, two new categories were created – “Varies” for respondents who specified times fitting into multiple given ranges and “Varies Early” for respondents who specified times fitting into multiple given ranges that were all between 1 a.m. and 3 a.m. 

For the questions about gender and race, respondents who checked “Other” and put in specifics that were not a gender or race, such as “human,” were changed to “Prefer not to say.” For respondents who checked “Other” and explained that they were Hispanic, Latino, or Latina, the “Other” classification was removed if and only if they had also checked that they were White (Hispanic status is captured in a separate question from race). 

Corrections were made to clean and standardize responses where possible. For example, ZIP codes containing the letter “O” instead of the number “0” were fixed. Finally, before releasing the data, fields were removed if they contained personal or identifying information. As a part of this process, all written responses that were submitted as specifications when respondent checked “Other” were removed, because some responses contained detailed information about respondents’ characteristics and behaviors. 

Data Caveats

In the coming weeks, OPMI will continue conducting and reporting on analysis of the survey results. However, as we are releasing our data, we caution that any analysis and interpretation done should be careful to take into account the limitations of the data. The sample of respondents was not randomly chosen and is not representative of the MBTA’s service area. It is a self-selected group of individuals who decided to take a (predominantly online) survey about a potential overnight bus service. The selection bias is evident in that 90+ percent of survey respondents report traveling at least once every three months during the 1 a.m. to 5 a.m. time period, a finding unlikely to be generalizable to all the residents of the MBTA service area.  Beyond this initial selection bias, there is additional bias introduced by the demographics of the sample. Although the respondents were fairly demographically representative with regards to gender, race, and income, over 60% of respondents fit into the 22-34 age range. Finally, it is important to remember that over 70% of the respondents who reported currently taking trips between 1 a.m. and 5 a.m. made no mention of traveling for work related purposes. Instead, most respondents reported traveling predominantly for social and recreational purposes, so analysis done on the sample as a whole speaks mainly to the characteristics and preferences of social travelers. 

Download Data

Download .CSV data file 

Download .XLSX codebook

Download slide deck with initial analysis (PDF)