Increasing Access to Global Transportation Data

by Data Clinic
A subway train moves toward the photographer on an elevated subway line with a background of an urban landscape of highrises, and a road full of car traffic below and parallel to the subway line on the left.
Two Sigma Data Clinic recently partnered with MobilityData to launch an API to the Mobility Database Catalogs, a repository of more than 2000 transportation feeds from across the world.

MobilityData, a Canadian nonprofit on a mission to improve travelers’ information, was founded in 2015. Today, the organization has 20 employees, and serves transport agencies, software vendors, mobility apps, and cities to standardize and expand data formats such as General Transit Feed Specification (GTFS) and General Bikeshare Feed Specification (GBFS) for public transport and shared mobility.

In 2023, our team had the opportunity to meet with the Mobility Data team at the launch event for TREC, a Data Clinic tool that uses GTFS data to identify transit stops at risk due to climate change. During the development of TREC, we experienced firsthand the challenges with sourcing GTFS data at scale. Despite the immense value of publishing open data, a significant obstacle remains in making this data easily accessible – this insight drove us to build our open data discovery tool, Scout, in 2020.

Independently, MobilityData recognized a similar challenge in public transit and created the Mobility Database Catalogs to source transit feed data from providers around the world. GTFS data serves as a common format for public transit schedules, and powers a number of valuable applications such as mapping tools (e.g., Google Maps, Apple Maps, etc.) and transit accessibility research. Recognizing our shared commitment to creating open source code and improving access to public data, we were enthusiastic about collaborating.

“Our goal at MobilityData is simple: Better transportation through data,” said Eric Plosky, Executive Director at MobilityData. “Making GTFS data easier for journey planners and other consumers to discover means more riders will have access to up-to-date directions they can trust.”

Building an open data API

The next phase of MobilityData’s work was to simplify access to the catalog beyond the Github repository by building an API service for software developers and other data consumers to programmatically access the contents of the Mobility Database.

Carrying on our philosophy from previous projects, it was important for us to make technical considerations that made it easy for our teams to collaborate, but also for the work to live on beyond our partnership. For this project, we chose to write the API in Python with the FastAPI framework, using PostgreSQL for the database.

We also decided which engineering efforts were best suited for our Data Clinic volunteers. Open-source engineering projects running in production benefit from a long-term maintainer handling DevOps issues such as hosting, continuous integration/continuous deployment, and long-running code, such as ETL pipelines. Accordingly, we divided the project into two broad sections: data infrastructure and app development, with Data Clinic volunteers assisting with the latter.

Keeping these considerations and constraints in mind, volunteers built out two key API endpoints: one for feeds, representing the metadata for GTFS Schedule and GTFS Real Time (GTFS-RT) feeds, and one for GTFS Schedule datasets, representing a point-in-time snapshot of the feed each time it’s updated or changed. In the process of developing a new project, there are often moments where early assumptions need to be rethought; this project was no different. One early decision was to arrange the API endpoints in a nested way: /feeds/gtfs and /feeds/gtfs_rt. Recognizing an ergonomic improvement, Data Clinic volunteers made a suggestion to separate those API endpoint paths and implemented them as: /gtfs and /gtfs_rt.

For further details about the launch of the API, please refer to the official press release from Mobility Data. Those interested in using the API can register here and access the API documentation here.

“Collaboration is at the heart of all the work we do and we’re incredibly grateful for the contributions from Two Sigma Data Clinic,” said Eric Plosky. “This was a key first step to making the Mobility Database the definitive open platform for GTFS globally.”

Continued and future developments

At the 2023 MobilityData North American Workshop in New York City, the MobilityData and Data Clinic teams presented an early beta version of the API to an audience comprised of transit advocates, transit agencies, trip planning apps, and other open data enthusiasts. The session elicited interest from stakeholders across the public transit space, demonstrating the need for such a product.

The Data Clinic volunteer team continued working on the Mobility Feeds API into 2024, contributing a new endpoint to provide additional information from the GTFS Validator to users.  The next phase of this project involves creating a user interface that employs the API to make the database easily accessible to users without coding knowledge.

Stay tuned for future updates as we continue work on this exciting effort to further democratize access to transportation data in collaboration with MobilityData.

This article is not an endorsement by Two Sigma of the papers discussed, their viewpoints or the companies discussed. The views expressed above reflect those of the authors and are not necessarily the views of Two Sigma Investments, LP or any of its affiliates (collectively, “Two Sigma”). The information presented above is only for informational and educational purposes and is not an offer to sell or the solicitation of an offer to buy any securities or other instruments. Additionally, the above information is not intended to provide, and should not be relied upon for investment, accounting, legal or tax advice. Two Sigma makes no representations, express or implied, regarding the accuracy or completeness of this information, and the reader accepts all risks in relying on the above information for any purpose whatsoever. Click here for other important disclaimers and disclosures.