Data Clinic and Aalborg University’s Department of the Built Environment (BUILD) teams are thrilled to publicly announce the open source repo of our work, in which we built fine-grained open source geographies that facilitate actionable research and insights, today at FOSS4G 2022 in Florence. Our partnership was focused on supporting BUILD’s goal to empower data-informed action for more equitable economic opportunity across Denmark.
Over the years, the BUILD team has developed innovative insights and new metrics related to quality of life using Statistics Denmark’s fine-grained 100x100m grid level administrative data. However, they knew the true value would lie in making their actionable insights open to the public for Danish municipality decisionmakers, community organizations, and researchers alike.
Until now, valuable data on the wellbeing of citizens has only been openly available at the level of large heterogeneous regions, making data-informed tailored action for urban areas vs rural areas next-to-impossible, and more granular data, which requires strong data science skills to use responsibly, is only available to select authorized agencies and researchers. BUILD set out to find the right group of stakeholders to develop and open source a new set of regions that fit right in the middle.
With the data in hand from Statistics Denmark, BUILD gathered vital financial support with a grant from data.org’s Inclusive Growth and Recovery Challenge, and connected with our team at Data Clinic to provide geospatial expertise and collaborate on the construction of these new not-too-big, not-too-small geographies — in short, the equivalent of U.S. census tracts for Denmark.
Building Socially Meaningful Polygons
Together, the Data Clinic and BUILD team aimed to step away from the 100x100m square grid. While it was computationally convenient, it didn’t reflect the world as it is today. Urban areas, neighborhoods, communities have grown and adapted to naturally occurring and people-made boundaries. Communities on one side of a wide river or a highway tend to have slightly different characteristics than those on the other side. We wanted this evolution to be reflected in the insights that BUILD eventually releases.
To develop these socially meaningful polygons, the team pulled in Denmark’s road network from Open Street Maps and municipality borders from Datafordeleren (Danish open data portal). Using a combination of existing (e.g. sf, lwgeom) and custom scripts and packages both in R and Python, we ended up with more intuitive building blocks.
Meeting Data Privacy Requirements Through Clustering
While we developed a more natural set of foundational polygons, we needed to then reintegrate Statistics Denmark’s 100x100m grid. The reasoning was two-fold. Firstly, as administrative data currently only exists at this grid level, the results would be immediately usable by everyone with access to it. Secondly, confirming we meet Denmark’s strict privacy requirements across all years of interest became a straightforward calculation as no interpolation or additional efforts from Statistics Denmark would be needed. Thus, we mapped the grids back onto our new polygons, assigning each grid cell to the polygon with the maximum overlap.
Making use of the Max-P regionalization algorithm from the spopt python library, and custom cluster processes, we clustered adjoining ‘gridified’ polygons until each new region represented a minimum of 100 people and a minimum of 50 households over the entire 30+ years of data that BUILD will eventually release.
With this, the Beta version of our collaboration’s granular geographies were complete.
There’s always room to improve methodologies, and both the BUILD and Data Clinic teams are excited to take next steps to ensure that all stakeholders find value in both these new regions and the tool BUILD is developing to publicly distribute their Danish economic prosperity metrics at this new level (slated for release in 2023).
In order to confirm or refute our hypothesis that natural and people-made boundaries would be a strong proxy for building homogenous population clusters, BUILD will proceed with spatial and longitudinal analyses to understand the variation in the regions’ characteristics.
If our new regions allow for statistically robust analyses across time and factors, our vision is that community organizations and municipality decision-makers will be able to better understand the evolution of their regions over time and make data-informed decisions on how to enable more opportunities for economic prosperity for their constituents. As we share the kinds of data that is available at this level, and the way it can be investigated through BUILD’s upcoming user-friendly tool, we are eager to incorporate these stakeholders’ feedback into all levels of the product and methodology.
Visit the Repo
We are thrilled to have the open source repo with the work to date now open to the public, and we look forward to receiving community feedback from our peer geospatial researchers and enthusiasts at such a seminal industry conference as FOSS4G. We invite you to contribute to the repo by reviewing, submitting issues, and commenting, or email us with questions or thoughts at email@example.com.