Distributed transactions suffer from poor performance due to two major limiting factors. First, distributed transactions suffer from high latency because each of their accesses to remote data incurs a long network delay. Second, this high latency increases the likelihood of contention among distributed transactions, leading to high abort rates and low performance. The authors present Sundial, an in-memory distributed optimistic concurrency control protocol that addresses these two limitations.
MiSoSouP is a suite of algorithms for extracting high-quality approximations of the most interesting subgroups, according to different interestingness measures, from a random sample of a transactional dataset.
In the machine learning research community, it is generally believed that there is a tension between memorization and generalization. This paper examines the extent to which this tension exists, by exploring whether it is possible to generalize by memorizing alone.
The authors survey and discuss methods proposed in the literature for estimating the Sharpe ratio; computing confidence intervals around a point estimation of the Sharpe ratio; and performing hypothesis testing on a single Sharpe ratio and on the difference between two Sharpe ratios.
The NERF and Heads projects bring Linux back to the cloud servers' boot ROMs by replacing nearly all of the vendor firmware with a reproducible built Linux runtime that acts as a fast, flexible, and measured boot loader.
The authors provide a solution for the Efficient Server Audit Problem based on several new techniques, including simultaneous replay and efficient verification of concurrent executions, implemented for PHP web applications.
Designing a system that can extract immediate insights from large amounts of data in real-time requires a special way of thinking. This talk presents a “reactive” approach to designing real-time, responsive, and scalable data applications that can continuously compute analytics on-the-fly. It also highlights a case study as an example of reactive design in action.
The author presents CelFS, Two Sigma’s geo-distributed file system. Although CelFS has scaled to serve tens of petabytes of data, it uses physical partitioning to provide quality of service guarantees, it has a high replication overhead, and cannot take advantage of outsourced cold storage The talk further describes our response to these limitations in Jaks, a new storage system to reduce the TCO of CelFS and serve as the backend for other systems at Two Sigma.
This presentation discusses the design and implementation of Smooth at Two Sigma, our experience running it over the past two years, ongoing challenges, and future directions.