Sign up to receive the latest Insights posts in your inbox.

Subscribe

The State of Open Data on School Bullying

How much of a problem is school bullying in NYC? The answer depends on who you ask. Data Clinic volunteers compared local surveys (where many students say bullying is happening) with federal data (where a majority of schools report zero incidents), to analyze these disparities for the 2013-14 school year.

Responsive and Scalable Real-time Data Analytics

Designing a system that can extract immediate insights from large amounts of data in real-time requires a special way of thinking. This talk presents a “reactive” approach to designing real-time, responsive, and scalable data applications that can continuously compute analytics on-the-fly. It also highlights a case study as an example of reactive design in action.

Archival Storage at Two Sigma

The author presents CelFS, Two Sigma’s geo-distributed file system. Although CelFS has scaled to serve tens of petabytes of data, it uses physical partitioning to provide quality of service guarantees, it has a high replication overhead, and cannot take advantage of outsourced cold storage The talk further describes our response to these limitations in Jaks, a new storage system to reduce the TCO of CelFS and serve as the backend for other systems at Two Sigma.