The author presents CelFS, Two Sigma’s geo-distributed file system. Although CelFS has scaled to serve tens of petabytes of data, it uses physical partitioning to provide quality of service guarantees, it has a high replication overhead, and cannot take advantage of outsourced cold storage The talk further describes our response to these limitations in Jaks, a new storage system to reduce the TCO of CelFS and serve as the backend for other systems at Two Sigma.
A group of Two Sigma statisticians highlight a selection of interesting talks and presentations from the 2017 Joint Statistical Meeting.
Writing in the Wall Street Journal, Two Sigma co-founder David Siegel argues that embracing the scientific method in investment management brings much-needed rigor to the process, while helping to counteract common but harmful biases.
The Information Age has transformed and brought spectacular advances to a wide range of industries, from medicine to transportation and beyond. Investment management has been by some measures slower to evolve, but it, too, is changing.
Berkeley’s Professor David E. Culler discusses the future of data science, the “Berkeley view” of the field, and the biggest challenges for data scientists today.
Cook, Two Sigma’s open-source resource scheduler for compute clusters, uses preemption to achieve low latency and high throughput.
A Two Sigma quantitative engineer walks through a paper on the history of formulas for computing meta-statistics on any distribution.
The author introduces Cook, an advanced fair-sharing, preemptive scheduling backend for Spark.