A Two Sigma researcher introduces the Pandas UDFs feature in the upcoming Apache Spark 2.3 release, which substantially improves the performance and usability of user-defined functions (UDFs) in Python.
The authors provide a solution for the Efficient Server Audit Problem based on several new techniques, including simultaneous replay and efficient verification of concurrent executions, implemented for PHP web applications.
Designing a system that can extract immediate insights from large amounts of data in real-time requires a special way of thinking. This talk presents a “reactive” approach to designing real-time, responsive, and scalable data applications that can continuously compute analytics on-the-fly. It also highlights a case study as an example of reactive design in action.
The cost of rolling futures contracts, rather than the decline in commodity prices, has been the largest drag on commodity index performance over the past 10 years. Although difficult to implement, asset allocators’ best response may be to develop dynamic execution strategies to mitigate the roll return “tax.”
The author presents CelFS, Two Sigma’s geo-distributed file system. Although CelFS has scaled to serve tens of petabytes of data, it uses physical partitioning to provide quality of service guarantees, it has a high replication overhead, and cannot take advantage of outsourced cold storage The talk further describes our response to these limitations in Jaks, a new storage system to reduce the TCO of CelFS and serve as the backend for other systems at Two Sigma.
This presentation discusses the design and implementation of Smooth at Two Sigma, our experience running it over the past two years, ongoing challenges, and future directions.
A senior Two Sigma researcher provides an overview of some of the most interesting Deep Learning research from ICML 2017.
A group of Two Sigma statisticians highlight a selection of interesting talks and presentations from the 2017 Joint Statistical Meeting.
A Two Sigma engineer explains how the company implemented a high-availability networking architecture using OpenStack that's suitable for deployment at enterprise scale and beyond.