An overview of Flint, an open source library Two Sigma developed to enhance Spark’s functionality for time series analysis.
Two Sigma security researcher Trammell Hudson presents his work on the new LinuxBoot firmware—now an official Linux Foundation project.
A Two Sigma researcher introduces the Pandas UDFs feature in the upcoming Apache Spark 2.3 release, which substantially improves the performance and usability of user-defined functions (UDFs) in Python.
The author presents CelFS, Two Sigma’s geo-distributed file system. Although CelFS has scaled to serve tens of petabytes of data, it uses physical partitioning to provide quality of service guarantees, it has a high replication overhead, and cannot take advantage of outsourced cold storage The talk further describes our response to these limitations in Jaks, a new storage system to reduce the TCO of CelFS and serve as the backend for other systems at Two Sigma.
Apache Arrow-based interconnection between various big data tools (SQL, UDFs, machine learning, big data frameworks, etc.) enables you to use them together seamlessly and efficiently,
The author introduces Huohua, Two Sigma’s implementation of highly optimized time series operations in Spark.