Sign up to receive the latest Insights posts in your inbox.

Subscribe

Sundial for VLDB

Distributed transactions suffer from poor performance due to two major limiting factors. First, distributed transactions suffer from high latency because each of their accesses to remote data incurs a long network delay. Second, this high latency increases the likelihood of contention among distributed transactions, leading to high abort rates and low performance. The authors present Sundial, an in-memory distributed optimistic concurrency control protocol that addresses these two limitations.

Learning and Memorization

In the machine learning research community, it is generally believed that there is a tension between memorization and generalization. This paper examines the extent to which this tension exists, by exploring whether it is possible to generalize by memorizing alone.

Responsive and Scalable Real-time Data Analytics

Designing a system that can extract immediate insights from large amounts of data in real-time requires a special way of thinking. This talk presents a “reactive” approach to designing real-time, responsive, and scalable data applications that can continuously compute analytics on-the-fly. It also highlights a case study as an example of reactive design in action.

Archival Storage at Two Sigma

The author presents CelFS, Two Sigma’s geo-distributed file system. Although CelFS has scaled to serve tens of petabytes of data, it uses physical partitioning to provide quality of service guarantees, it has a high replication overhead, and cannot take advantage of outsourced cold storage The talk further describes our response to these limitations in Jaks, a new storage system to reduce the TCO of CelFS and serve as the backend for other systems at Two Sigma.