Scout Goes Global

Explore the potential of over 100 open data portals at your fingertips

Two years ago, as part of NYC Open Data Week, we unveiled Scout, our open source data discoverability tool to explore New York City’s open data portalScout allows you to search for open datasets and recommends other datasets that might be relevant to you. The tool also surfaces thematically similar datasets as well as datasets that have joinable columns that can enrich the dataset you are looking at.

The launch of Scout was met with excitement from the local open data community, but we knew that this was just scratching the surface of what was possible. There is a lot more data out there and the data needs of researchers typically go well beyond a single city. As part of NYC Open Data Week 2022, we unveiled the newest update to Scout: a multi-portal approach to data discoverability. The update allows Scout users to explore 129 portals across the world hosting over 125,000 datasets, and analyze relationships across more than 1.2 million columns.

A screengrab of scout: a listing of NYC-related datasets and their descriptions under a drop-down menu to select any available open data portal to peruse
The new portal dropdown allows users to select specific data portals.

Scout now includes a portal selector dropdown where you can choose which open data portal you want to explore. In the top-right you can also toggle between exploring a single portal or looking at all available data portals. At the moment, Scout wraps the Socrata API, so the tool is limited to Socrata-hosted data portals. As part of our future roadmap we intend to expand this to include more open data management systems, such as CKAN and ESRI.

In addition to multi-portal support, existing Scout features were updated to improve the user experience.

Another screen grab of Scout showing a list of datasets and their descriptions under a pop-up ‘sign in’ modal that provides options to make an account or sign in using Google or Facebook
Users can now log in to Scout using existing social media accounts or with their email and a unique password.

Scout now supports user authentication that allows you to create a user account either through an existing social media account (Facebook or Google) or with your own email and password combination. This feature allows you to keep your curated collections forever by linking them to your user account. We do not track your searches or your personal information beyond what is absolutely necessary for you to log in. If you prefer to continue using Scout without logging in, you can still do so and have access to all of Scout’s search features. However, without logging in, you would lose your stored collections when you clear your browser cache or switch computers.

A screengrab of Scout that shows a description of the data in the left-hand section, and the list of GitHub commits and repos that use that same data in the right-hand section
The new resources tab provides examples of how the dataset has been previously utilized.

This newest update also includes a resources tab that displays examples of a dataset’s usage that can help guide researchers in understanding how the dataset can be applied and explore how it has been previously used. We hope this feature will facilitate collaborations and accelerate innovation by showcasing past work that can be built upon and limiting redundant analyses.

In the future, we hope to incorporate blog posts that mention the data to the resources tab to continue to build a library of inspiration for users. We also see a lot of potential in enabling users to submit their own helpful resources directly into Scout.

We are excited to see how this new iteration of Scout can help open data researchers find helpful datasets that extend beyond their local communities. We encourage you to check out and share any feedback you have with us as we continue to beta test this tool. If you find any bugs, please open a GitHub issue for us to address.
We also encourage you to open a GitHub discussion with any feature ideas or contribute by cloning our repo. Our repo contains instructions on setting up Scout locally and contributing to its codebase. We are actively working to improve Scout and would love to have you involved.

Read more from Data Clinic

This article is not an endorsement by Two Sigma of the papers discussed, their viewpoints or the companies discussed. The views expressed above reflect those of the authors and are not necessarily the views of Two Sigma Investments, LP or any of its affiliates (collectively, “Two Sigma”). The information presented above is only for informational and educational purposes and is not an offer to sell or the solicitation of an offer to buy any securities or other instruments. Additionally, the above information is not intended to provide, and should not be relied upon for investment, accounting, legal or tax advice. Two Sigma makes no representations, express or implied, regarding the accuracy or completeness of this information, and the reader accepts all risks in relying on the above information for any purpose whatsoever. Click here for other important disclaimers and disclosures.