Sign up to receive the latest Insights posts in your inbox.

Subscribe

Responsive and Scalable Real-time Data Analytics

Designing a system that can extract immediate insights from large amounts of data in real-time requires a special way of thinking. This talk presents a “reactive” approach to designing real-time, responsive, and scalable data applications that can continuously compute analytics on-the-fly. It also highlights a case study as an example of reactive design in action.

Archival Storage at Two Sigma

The author presents CelFS, Two Sigma’s geo-distributed file system. Although CelFS has scaled to serve tens of petabytes of data, it uses physical partitioning to provide quality of service guarantees, it has a high replication overhead, and cannot take advantage of outsourced cold storage The talk further describes our response to these limitations in Jaks, a new storage system to reduce the TCO of CelFS and serve as the backend for other systems at Two Sigma.

The Language of Compression

This presentation discusses each part of the durable storage stack, from the hardware on up, and how usage numbers can take on different meanings at each layer. It covers  what's important to know at each layer, and how to think about and talk about concepts like compression, fragmentation, write amplification, and wear leveling. Finally, it examines different ways benchmarketers present data deceptively, and provides some techniques for identifying and cutting through those kinds of misrepresentations.