Who is it for?
Researchers
Longitudinal, multi-city data can inform investigations in factors that impact bus system performance, and other influences that determine ridership.
Students
Cornell Tech students have made everything from maps to machine-learning models from our data. What can you do?
Advocates
Data was the driver for Transit Center's highly effective Bus Turnaround campaign in New York City. Power your campaign with big bus data.
Get Started
We collect bulk bus position data from transit systems around the world, sampled at one-minute intervals. The Bus Observatory API allows you to query and retrieve this data one "route-hour" at a time (60 minutes worth of observations for a single route).
1. Download A Sample Data Set
We have prepared several data sets for data science explorations. All are extracted from the New York City Transit SIRI feed (nyct_mta_bus_siri
). For efficiency and data type integrity they are provided as parquet files, which can be loaded into any modern data science tool.
- One route-day. July 5, 2023. M1 route only. (parquet)
- One route-month. July 1-31, 2023. M1 route only. (parquet)
- One system-day. July 5, 2023. All routes. (parquet)
- One system-month. July 1-31, 2023. All routes. (parquet)
2. Try A Query
- Browse the list of active feeds in the table below, and make note of the
system_id
. - Visit the Bus Observatory API's Swagger utility, and click "Try It Out". Enter the
system_id
, a route name or number from that system, and a year, month, day, and hour.
Name | Place | system_id | Type & Schema |
---|---|---|---|
Massachusetts Bay Transit Authority | Boston, MA, US | mbta_all | gtfsrt |
New York City Transit (GTFS-RT) | New York City, NY, US | nyct_mta_bus_gtfsrt | gtfsrt |
New York City Transit (SIRI) | New York City, NY, US | nyct_mta_bus_siri | siri |
San Francisco Muni | San Francisco, CA, US | sf_muni | gtfsrt |
Transport for New South Wales | Sydney, NSW, AU | tfnsw_bus | gtfsrt |
Washington Metropolitan Area Transit Authority | Washington, DC, US | wmata_bus | gtfsrt |
Next Steps
3. Review the Data License
All data is licensed under a CC BY-NC 4.0 license. Please review the license terms before using this data.
4. Query the API Directly
Data can also be accessed directly via the API endpoint. The format is:
https://api.busobservatory.org/buses/bulk/{system_id}/{route}/{year}/{month}/{day}/{hour}
For example, to get all of the positions recorded from the New York City MTA Buses SIRI feed between 9pm and 10pm on October 4, 2022, go to https://api.busobservatory.org/buses/bulk/nyct_mta_bus_siri/M1/2022/7/10/21.
Or, send a request from a command line using curl
:
curl -X 'GET' \ 'https://api.busobservatory.org/buses/bulk/nyct_mta_bus_siri/M1/2022/7/4/21' \ -H 'accept: application/json'
5. Get Sequences of Data
To retrieve more route-hour bulk data sets, you can request sequences or sets of route-hours. For example, the following Python function:
- takes a
system_id
,route
, andstart
andend
time (in ISO8501 format) as arguments, - generates a list of dates and hours within this interval,
- retrieves the bulk data for each our from the Bus Observatory API, and
- combines these responses into a single Pandas dataframe.
import pandas as pd import requests def get_buses(system_id, route, start, end): df = pd.DataFrame() times = (pd.date_range(start=pd.Timestamp(start), end=pd.Timestamp(end), freq="1H") .to_pydatetime() .tolist() ) for t in times: url = f"https://api.busobservatory.org/buses/bulk/{system_id}/{route}/{t.year}/{t.month}/{t.day}/{t.hour}" r = requests.get(url).json() newdata = pd.DataFrame.from_dict(r["result"]) df = pd.concat([df, newdata], ignore_index=True, sort=False) return df
Then, grabbing a whole day's data for a single route, and writing it to a single parquet file from the results is as simple as:
get_buses( 'nyct_mta_bus_siri', 'M1', '2022-07-21T00:00:00', '2022-07-22T00:00:00' ).to_parquet('buses.parquet')
Notes
Latency & Cold Starts
This service is implemented entirely through serverless technologies. It not intended to be a backend for web services. You may experience response times in excess of 60 seconds for bulk data requests.
Rate Limiting
API access is rate limited. If you receive a 429 Too Many Requests
error, please wait and try again later.
Bugs
Please report any problems you have using the API.