Two years ago, as part of NYC Open Data Week, we unveiled Scout, our open source data discoverability tool to explore New York City’s open data portal. Scout allows you to search for open datasets and recommends other datasets that might be relevant to you. The tool also surfaces thematically similar datasets as well as datasets that have joinable columns that can enrich the dataset you are looking at.
The launch of Scout was met with excitement from the local open data community, but we knew that this was just scratching the surface of what was possible. There is a lot more data out there and the data needs of researchers typically go well beyond a single city. As part of NYC Open Data Week 2022, we unveiled the newest update to Scout: a multi-portal approach to data discoverability. The update allows Scout users to explore 129 portals across the world hosting over 125,000 datasets, and analyze relationships across more than 1.2 million columns.
Scout now includes a portal selector dropdown where you can choose which open data portal you want to explore. In the top-right you can also toggle between exploring a single portal or looking at all available data portals. At the moment, Scout wraps the Socrata API, so the tool is limited to Socrata-hosted data portals. As part of our future roadmap we intend to expand this to include more open data management systems, such as CKAN and ESRI.
In addition to multi-portal support, existing Scout features were updated to improve the user experience.
Scout now supports user authentication that allows you to create a user account either through an existing social media account (Facebook or Google) or with your own email and password combination. This feature allows you to keep your curated collections forever by linking them to your user account. We do not track your searches or your personal information beyond what is absolutely necessary for you to log in. If you prefer to continue using Scout without logging in, you can still do so and have access to all of Scout’s search features. However, without logging in, you would lose your stored collections when you clear your browser cache or switch computers.
This newest update also includes a resources tab that displays examples of a dataset’s usage that can help guide researchers in understanding how the dataset can be applied and explore how it has been previously used. We hope this feature will facilitate collaborations and accelerate innovation by showcasing past work that can be built upon and limiting redundant analyses.
In the future, we hope to incorporate blog posts that mention the data to the resources tab to continue to build a library of inspiration for users. We also see a lot of potential in enabling users to submit their own helpful resources directly into Scout.
We are excited to see how this new iteration of Scout can help open data researchers find helpful datasets that extend beyond their local communities. We encourage you to check out and share any feedback you have with us as we continue to beta test this tool. If you find any bugs, please open a GitHub issue for us to address.
We also encourage you to open a GitHub discussion with any feature ideas or contribute by cloning our repo. Our repo contains instructions on setting up Scout locally and contributing to its codebase. We are actively working to improve Scout and would love to have you involved.