Authors: Saurabh Goel
Presented at: CMU Time Series Database Lectures – Fall 2017
Abstract: Smooth is a distributed storage system for managing structured time series data at Two Sigma. Smooth’s design emphasizes scale, both in terms of size and aggregate request bandwidth, reliability and storage efficiency. It is optimized for large parallel streaming read/write accesses over provided time ranges. Smooth has a clear separation between the metadata and data layers, and supports multiple pluggable object stores for storing data files. Data can be replicated or moved between different stores and data centers to support availability, performance and storage tiering objectives. Smooth is widely used at Two Sigma by various applications including modeling research workflows, data pipelines and various data analysis jobs. Smooth has been in development for about 5 years, currently stores multiple PBs of compressed data, and serves peak aggregate throughput in excess of 100 GB/s. In this talk, the author discusses the design and implementation of Smooth, our experience running it over the past two years, ongoing challenges, and future directions.