Better data means better buses.

The Bus Observatory is a public archive of real-time data on vehicle movements and status, collected from transit systems around the world. This free service is provided by the Jacobs Urban Tech Hub at Cornell Tech.

Spring 2025 Update: Over the next few months we will be migrating all feeds to a new data collection infrastructure based on the gtfs-realtime-capsule project, and providing information on how to deploy your own feed collector. Stay tuned.

About the Bus Observatory

Real-time transit data is valuable for travelers, helping them use services more effectively with less inconvenience. However, this data is also valuable for analyzing larger trends and issues. However, this data is ephemeral. Transit agencies lack the resources to achive and maintain logs of position data for use by the public.

The Bus Observatory, launched in 2020, seeks to fill this gap by retrieving and archiving systemwide snapshots of vehicle positions every minute—24 hours a days, 365 days a year, for several transit systems around the world.

Our goal is demonstrate the potential for longitudinal, cross-city studies that can inform both operations and long-term planning.

About The Data

All retrieved data is stored in a public AWS S3 bucket (busobservatory-lake). Files are sorted in the feeds folder under unique names (e.g. sf_muni. Some transit systems combine bus and rail data, for others we only colect bus data.

There are two types of files for each transit system feed.

Daily Compacted Files

These files each contain minute-by-minute, observations of individual fleet vehicles' position and status—approximately 1,440 batches per 24 hour period, combined in a single Parquet file. Batches are compacted at 24-hour intervals on different schedules, and do not correspond to calendar days. Filenames follow the template: COMPACTED_feedname_YYYY-MM-DD_HH:MM:SS.parquet. For example COMPACTED_nyct_mta_bus_gtfsrt_2025-01-24_18:03:14.parquet contains all observed vehicle positions in the New York City Transit Buses GTFS-RT feed for the 24 hours ending at 6:03:14 PM on January 24, 2025.

Incoming Data

Recently retreived data that has not been compacted. Each file holds a single system-wide batch of observations. Filenames follow the template: INCOMING_feedname_YYYY-MM-DD_HH:MM:SS.parquet. For example INCOMING_nyct_mta_bus_gtfsrt_2025-01-25_18_02_05.parquet contains all observed vehicle positions in the New York City Transit Buses GTFS-RT feed at approximately 6:02 PM on January 25, 2025.

Table Format

For GTFS-RT files, we parse all fields in the feed, including vehicle positions, trip updates, and service alerts. For other feeds (e.g. NJTransit and NYC Transit SIRI, we parse a variery of vehicle positions and status fields. Note, as of January 2025 collection of all non-GTFS-RT feeds has been suspended.)

Getting Data

There are several typical ways to access data stored in a public S3 bucket:

Direct URL Access

You can access the files directly via their URL. For example: https://busobservatory-lake.s3.amazonaws.com/feeds/nyct_mta_bus_gtfsrt/COMPACTED_nyct_mta_bus_gtfsrt_2025-01-24_18:03:14.parquet

Using AWS CLI

You can use the AWS Command Line Interface to download files. For example:

aws s3 cp s3://busobservatory-lake/feeds/nyct_mta_bus_gtfsrt/COMPACTED_nyct_mta_bus_gtfsrt_2025-01-24_18:03:14.parquet

Using Boto3 in Python

You can use the Boto3 library in Python to programmatically access the files. For example:


import boto3

s3 = boto3.client('s3')
s3.download_file(
    'busobservatory-lake',
    'feeds/nyct_mta_bus_gtfsrt/COMPACTED_nyct_mta_bus_gtfsrt_2025-01-24_18:03:14.parquet',
    'COMPACTED_nyct_mta_bus_gtfsrt_2025-01-24_18:03:14.parquet'
)

API Retired

The Bus Observatory API has been retired. All data is now available via the public Amazon Web Services S3 bucket only.

License

All data is licensed under a CC BY-NC 4.0 license. Please review the license terms before using this data.