Algorithmic Data Science = Theory + Practice

Author: Matteo Riondato (Two Sigma)

Presented at:  IEEE MIT Undergraduate Technology Research Conference, Massachusetts Institute of Technology, Cambridge, MA

Abstract:  Obtaining actionable insights from large datasets requires the use methods that must be, at once, fast, scalable, and statistically sound. This is the field of study of algorithmic data science, a discipline at the border of computer science and statistics. In this talk I outline the fundamental questions that motivate research in this area, present a general framework to formulate many problems in this field, introduce the challenges in balancing theoretical and statistical correctness with practical efficiency, and I show how sampling-based algorithms are extremely effective at striking the correct balance in many situations, giving examples from social network analysis and pattern mining. I will conclude with some research directions and areas for future explorations.

This article is not an endorsement by Two Sigma of the papers discussed, their viewpoints or the companies discussed. The views expressed above reflect those of the authors and are not necessarily the views of Two Sigma Investments, LP or any of its affiliates (collectively, “Two Sigma”). The information presented above is only for informational and educational purposes and is not an offer to sell or the solicitation of an offer to buy any securities or other instruments. Additionally, the above information is not intended to provide, and should not be relied upon for investment, accounting, legal or tax advice. Two Sigma makes no representations, express or implied, regarding the accuracy or completeness of this information, and the reader accepts all risks in relying on the above information for any purpose whatsoever. Click here for other important disclaimers and disclosures.