HUOHUA: A Distributed Time Series Analysis Framework For Spark

Posted on June 7, 2016

Author: Wenbo Zhao

Presented at: Spark Summit 2016, San Francisco, CA

Abstract: The ability to analyze time series of data at scale is critical for the success of finance and IoT applications based on Spark. This session introduced Huohua, Two Sigma’s implementation of highly optimized time series operations in Spark. Huohua performs truly parallel and rich analyses on time series data by taking advantage of the natural ordering in time series data to provide locality-based optimizations. Huohua is an open source library for Spark based around the OrderedRDD, a time series aware data structure, and a collection of time series utility and analysis functions that use OrderedRDDs. Unlike DataFrames and Datasets, Huohua’s OrderedRDDs can leverage the existing ordering properties of datasets at rest and the fact that almost all data manipulations and analysis over these datasets respect their ordering properties. It differs from other time series efforts in Spark in its ability to efficiently compute across panel data or on large scale high frequency data. In this talk, we will present the architecture of OrderedRDDs and its integration with Spark SQL, DataFrames, and Datasets; and the analysis tools we’ve built on top to merge, join, aggregate, and intervalize data, and compute windowed, rolling, and cycle-based summarizations and cross-panel analysis. We’ll present results comparing Huohua to similar operations using off the shelf RDDs and DataFrames.

Download PDF — 1.19 MB

The views expressed above are not necessarily the views of Two Sigma Investments, LP or any of its affiliates (collectively, “Two Sigma”).  The information presented above is only for informational and educational purposes and is not an offer to sell or the solicitation of an offer to buy any securities or other instruments. Additionally, the above information is not intended to provide, and should not be relied upon for investment, accounting, legal or tax advice. Two Sigma makes no representations, express or implied, regarding the accuracy or completeness of this information, and the reader accepts all risks in relying on the above information for any purpose whatsoever. Click here for other important disclaimers and disclosures.

Related Articles