Engineering with Open Source

The author walks through how to build a metrics system for a high performance data platform, taking a look at some of the important factors to consider when choosing what open source offerings to use.

Sundial for VLDB

Distributed transactions suffer from poor performance due to two major limiting factors. First, distributed transactions suffer from high latency because each of their accesses to remote data incurs a long network delay. Second, this high latency increases the likelihood of contention among distributed transactions, leading to high abort rates and low performance. The authors present Sundial, an in-memory distributed optimistic concurrency control protocol that addresses these two limitations.

Learning and Memorization

In the machine learning research community, it is generally believed that there is a tension between memorization and generalization. This paper examines the extent to which this tension exists, by exploring whether it is possible to generalize by memorizing alone.