Who is it for?


Researchers

Longitudinal, multi-city data can inform investigations in factors that impact bus system performance, and other influences that determine ridership.

Students

Cornell Tech students have made everything from maps to machine-learning models from our data. What can you do?

Advocates

Data was the driver for Transit Center's highly effective Bus Turnaround campaign in New York City. Power your campaign with big bus data.

Get Started


We collect bulk bus position data from transit systems around the world, sampled at one-minute intervals. The Bus Observatory API allows you to query and retrieve this data one "route-hour" at a time (60 minutes worth of observations for a single route).

1. Download A Sample Data Set

We have prepared several data sets for data science explorations. All are extracted from the New York City Transit SIRI feed (nyct_mta_bus_siri):

  • One route-day. July 5, 2022. M1 route only. (CSV.tar.gz)
  • One route-month. July 1-31, 2022. M1 route only. (CSV.tar.gz)
  • One system-hour. July 7, 2022. All routes. (CSV.tar.gz)
  • One system-day. July 5, 2022. All routes. (CSV.tar.gz)
  • One system-month. July 1-31, 2022. All routes. (CSV.tar.gz)

2. Try A Query

  • Browse the list of active feeds in the table below, and make note of the system_id.
  • Visit the Bus Observatory API's Swagger utility, and click "Try It Out". Enter the system_id, a route name or number from that system, and a year, month, day, and hour.
Name Place system_id Type & Schema
Massachusetts Bay Transit Authority Boston, MA, US mbta_all gtfsrt
NJTransit NJ, US njtransit_bus njxml
New York City Transit (GTFS-RT) New York City, NY, US nyct_mta_bus_gtfsrt gtfsrt
New York City Transit (SIRI) New York City, NY, US nyct_mta_bus_siri siri
San Francisco Transportation Agency San Francisco, CA, US sf_muni gtfsrt
Transport for New South Wales Sydney, NSW, AU tfnsw_bus gtfsrt
Washington Metropolitan Area Transit Authority Washington, DC, US wmata_bus gtfsrt

Next Steps


3. Review the Data License

All data is licensed under a CC BY-NC 4.0 license. Please review the license terms before using this data.

4. Query the API Directly

Data can also be accessed directly via the API endpoint. The format is:

https://api.busobservatory.org/buses/bulk/{system_id}/{route}/{year}/{month}/{day}/{hour}

For example, to get all of the positions recorded from the New York City MTA Buses SIRI feed between 9pm and 10pm on October 4, 2022, go to https://api.busobservatory.org/buses/bulk/nyct_mta_bus_siri/M1/2022/7/10/21.

Or, send a request from a command line using curl:

API via CLI
curl -X 'GET' \
    'https://api.busobservatory.org/buses/bulk/nyct_mta_bus_siri/M1/2022/7/4/21' \
    -H 'accept: application/json'

5. Get Sequences of Data

To retrieve more route-hour bulk data sets, you can request sequences or sets of route-hours. For example, the following Python function:

  1. takes a system_id, route, and start and end time (in ISO8501 format) as arguments,
  2. generates a list of dates and hours within this interval,
  3. retrieves the bulk data for each our from the Bus Observatory API, and
  4. combines these responses into a single Pandas dataframe.
Download to DataFrame
import pandas as pd
import requests

def get_buses(system_id, route, start, end):

    df = pd.DataFrame()

    times = (pd.date_range(start=pd.Timestamp(start), end=pd.Timestamp(end), freq="1H")
        .to_pydatetime()
        .tolist()
    )

    for t in times:
        url = f"https://api.busobservatory.org/buses/bulk/{system_id}/{route}/{t.year}/{t.month}/{t.day}/{t.hour}"
        r = requests.get(url).json()
        newdata = pd.DataFrame.from_dict(r["result"])
        df = pd.concat([df, newdata], ignore_index=True, sort=False)
    
        return df
      
    

Then, grabbing a whole day's data for a single route, and writing it to a single parquet file from the results is as simple as:

Get A Data Sequence
get_buses(
    'nyct_mta_bus_siri', 
    'M1', 
    '2022-07-21T00:00:00',
    '2022-07-22T00:00:00'
    ).to_parquet('buses.parquet')

Notes


Latency & Cold Starts

This service is implemented entirely through serverless technologies. It not intended to be a backend for web services. You may experience response times in excess of 60 seconds for bulk data requests.

Rate Limiting

API access is rate limited. If you receive a 429 Too Many Requests error, please wait and try again later.

Bugs

Please report any problems you have using the API.