Introducing Flint: A Time-Series Library for Apache Spark
An overview of Flint, an open source library Two Sigma developed to enhance Spark’s functionality for time series analysis.
An overview of Flint, an open source library Two Sigma developed to enhance Spark’s functionality for time series analysis.
A Two Sigma engineer explains how how Ibis works with BigQuery to provide an elegant and flexible Python interface for composing SQL queries.
An overview of best practices derived from building a machine-learning based starter bot for Halite, Two Sigma’s public artificial intelligence programming challenge.
Architecture overview for the future of the Python Pandas data analytics library.
An overview of BeakerX, a collection of kernels and extensions to the Jupyter interactive computing platform.
A Two Sigma researcher introduces the Pandas UDFs feature in the upcoming Apache Spark 2.3 release, which substantially improves the performance and usability of user-defined functions (UDFs) in Python.
Apache Arrow-based interconnection between various big data tools (SQL, UDFs, machine learning, big data frameworks, etc.) enables you to use them together seamlessly and efficiently,
Speed and repeatability are crucial in machine learning, but the latter is not guaranteed in TensorFlow. A Two Sigma researcher demonstrates a workaround to attain repeatable results.