Data Clinic’s history of using data to do good
As Two Sigma’s pro bono data- and tech-for-good initiative, we help mission-driven organizations use their data and tech more effectively. While most of our projects have been research-focused with some lighter-weight tooling supports, we launched our first deeply collaborative engineering engagement in close concert with our friends at the New York Stem Cell Foundation (NYSCF) the summer of 2020. NYSCF, a nonprofit organization whose mission is to accelerate new treatments and cures for patients, is an innovative engineering partner engaged in large-scale, robust, and cutting-edge science.
Data Clinic’s relationship with NYSCF began in 2019, thanks to an introduction from Alfred Spector, Two Sigma’s retired CTO who also serves as a Senior Scientific Advisor to NYSCF on various programs. Data Clinic and NYSCF first partnered on a data science project to improve the workflow of the NYSCF Global Stem Cell Array® (TGSCA™): a unique, fully automated platform for the production, expansion, gene editing, and differentiation of stem cells (you can read about this project here). On the heels of this successful collaboration, a new team of Data Clinic volunteers came together for an engineering-based project to optimize a critical piece of software researchers use in conjunction with the TGSCA™.
Advancing medical research with microscopic images of cells
The NYSCF Research Institute, an independent laboratory, is a non-profit accelerator that bridges the gap between research institutions and pharmaceutical and biotech companies by reducing the cost, time, and risk that historically inhibit the development of new treatments and cures. Since its founding in 2005, NYSCF has shed light on conditions including Alzheimer’s disease, diabetes, macular degeneration, multiple sclerosis, Parkinson’s disease, and COVID-19.
The NYSCF Global Stem Cell Array® is capable of deriving hundreds of induced pluripotent stem cell (iPSC) lines per month, a process in which NYSCF researchers revert donated skin or blood samples into a stem cell state. These are used to study the disease by turning the iPSCs into cells such as neurons, cardiomyocytes (heart cells), and hepatocytes (liver cells). NYSCF has generated iPSC lines from various cohorts, including Alzheimer’s disease and macular degeneration, with these cell lines available to the research community through the NYSCF Repository.
“The Data Clinic team has done outstanding work with our researchers in optimizing aspects of our data-analysis pipeline, and now working together on an engineering-focused project has been terrific,” said Dan Paull, PhD, SVP of Discovery and Platform Development at NYSCF. “This project has not only benefited our researchers, providing them a valuable tool for interacting with our data, but has given our software engineers a chance to draw on the experience of the talented developers at Two-Sigma. We are incredibly grateful for their amazing pro bono help, and I’m looking forward to what we will accomplish together.”
A fundamental component of the NYSCF workflow is growing wells of cells, examining images of these cells, and making decisions based on their morphology and growth. When stem cells are generated for use in disease research, it can be important from a quality control perspective to be sure that each sample of cells arises from a single ‘clone’ or single cell. While cell wells are imaged daily, commercially available tools designed to aid in reviewing ‘clonality’ were slow to respond, severely limiting NYSCF researchers’ bandwidth. Taking advantage of NYSCF’s expansive imaging dataset, data scientists at the foundation developed a deep learning framework (Monoqlo℠) for automatically detecting the clonality of cell colonies based on the automated images of the cells captured daily (published in Nature Machine Intelligence). Building upon this, NYSCF had developed a tool for visualizing the results and images derived from Monoqlo℠; however, as its usage increased, the functional requirements of the tool exceeded NYSCF’s engineering bandwidth. Therefore, the goal of Data Clinic’s involvement was to optimize the tool to make things run faster, add features to track and annotate cell wells, and align the underlying codebase closer to that of NYSCF’s growing web apps.
Improving NYSCF’s cell visualization tool
In the new Visualization Tool (VisTool), NYSCF researchers can view the clonality call and class of each cell well on a given plate, load and view images of each cell well over time, and flag specific wells for further review. Thanks to optimization around image loading, the time required to view hundreds of wells was drastically reduced, with researchers now able to review data over 16 times faster. Amongst other logistical benefits, collaborating with the new tool is also substantially easier, since researchers can now simply share a link to any screen within the tool.
Development principles
The Data Clinic engineering team aimed to build a tool easy to maintain and extend. This meant keeping the codebase easily readable for new developers and using a component-based design that could be reused for other similar workflows. The collaborative group tackled several specific technical queries from this mandate.
The future
We look forward to seeing how the time saved through this more efficient cell monitoring tool will be spent on doing impactful research. We’re also thrilled to seek out new engineering-based collaborations based on our successes in this space thus far.