Behind the scenes with MBTA data.

In the early days of the blog, we wrote a post explaining how we track and locate vehicles. But we never finished explaining the most complicated and interesting part: how we track vehicles on the Green Line. Since the Green Line has elements of light rail, streetcar and subway modes, we need to combine different data sources and methods to know where its vehicles are.


This post describes the technical details of tracking Green Line vehicles. The final output is included in the larger MBTA passenger data which feeds the new MBTA.com (see it in action here) as well as apps like the Transit app. You can read more about the format of the data here.

Note that passenger information is not captured in the vehicle locations feed. We combine this location information with data from our fare collection system to know where our passengers are boarding and infer their destinations via the ODX model.

Image of a Green Line train.

The Hardware

There are a few different types of hardware that give us information on where Green Line trains are. Each has its own benefits and drawbacks, and each is used in different places along the lines.


In the beginning, there were AVIs - Automatic Vehicle Identifiers . The AVI system includes sensors on the tracks called “key points” that read information from Green Line cars that pass by. When AVI equipment detects the presence of a train car, it communicates with the car via RFID and gets three pieces of information:

  • The ID number of the car (for example, 3878). These are painted on the inside and outside of all Green Line cars.
  • The ID number of the route that the car is on (more about this in a second).
  • Whether or not the car is coupled to another car.

The route information is more specific than what Green Line riders see: while most information is presented to the public as though there were only four branches, labeled “B”, “C”, “D”, and “E”, for internal purposes there are many more possibilities. For example, route 831 runs from Cleveland Circle to North Station, and is the typical C Branch routing, but other routes such as 833 (Cleveland Circle to Park Street) or 830 (Cleveland Circle to Lechmere) may be used throughout the day depending on dispatching needs.

Track Circuits

A track circuit detects the presence (or absence) of a train on a particular section of track. Unlike an AVI, which is located at a specific, known point in space, a track circuit is a linear segment that could be hundreds of yards (even miles) long, although on the Green Line they are typically much shorter than that. Also unlike an AVI, track circuits only provide a “true” or “false” indication about whether or not a train is present. Any other information must be inferred from the other data sources. Only some sections of the Green Line have track circuits.


GPS is the third and final data source. Because AVI and track-circuit data are only available to us on certain sections of the Green Line, mostly underground, it was crucial to add another source that would cover the aboveground sections. GPS was the obvious answer. We began installing GPS devices and cellular modems on Green Line trains. A GPS device gives us the location and ID number of the train car that it is on, but no route data.

We were able to launch Green Line tracking earlier than expected, when only about half of Green Line cars had GPS installed, because of the way the MBTA puts together Green Line trains. There are two types of train cars on the Green Line, called “Type 7” and “Type 8” cars. Because Type 7 cars are not wheelchair-accessible, we require that each train have at least one Type 8 car, if at all possible. Therefore, we decided to install GPS on Type 8 cars first, which gave us a high degree of confidence that each train would have at least one GPS unit on it, and allowed us to launch the real-time data feeds for Green Line. We have since begun installing GPS on Type 7 cars as well, which gives us additional redundancy in case one fails on a Type 8.

The Software

Combining the three disparate data sources into a single data feed happens in a software application developed at the MBTA. There is a lot of detail that we won’t cover here, but the following is a brief summary of how we combine data from the different sources.

First note the differences between what data each source provides:

  Location Car Number Route
Track Circuit X    

AVIs are the only source that includes route information, so we have to trust them on that. If we do not have access to an AVI reading that we can associate with a particular train, we must fall back to a default route, which works reasonably well on the above-ground parts of the Green Line (for example, a train on Commonwealth Ave has a high likelihood of being on the Boston College-to-Park-Street route).

Car number information is available from both AVIs and GPS, and is also the way that we identify which train a particular piece of data is associated with. Often from GPS we will only get one of the car numbers (because only the Type 8 car will have GPS, as mentioned above). Until a train passes an AVI, we may only have knowledge of one of the two car numbers.

Finally, location information is available from all three sources, which often overlap. We use the sources as follows:

  1. If track circuits are available, we use them because they are maintained at the highest level of reliability. Many of our track circuits are used to drive signals which makes them a safety-critical system, and hence a top priority for maintenance.
  2. If no track circuits are available and the train is aboveground, we use GPS. This is a good source for locations because it updates frequently and has a high level of precision.
  3. Finally, if neither track-circuit nor GPS data is available, we fall back to AVIs as a source of location data. This is the least preferable data source because they are more spread out than track circuits, so a train’s location won’t be updated as frequently. This is mainly the case between Hynes and Haymarket stations on the Green Line; you’ll notice that locations of these trains update less frequently in the data feeds than at other places.


These three data sources, combined with the software that interprets them, allow our riders to see how long they’ll have to wait for the next train, whether they are looking at a countdown sign in a station, or checking their phone. In addition, this data feeds into our performance measurement systems, which are shown on the dashboard and are used by operations staff to monitor the system throughout the day.