MiSoSouP is a suite of algorithms for extracting high-quality approximations of the most interesting subgroups, according to different interestingness measures, from a random sample of a transactional dataset.
Two Sigma’s Co-Founder discusses algorithmic investing, self-driving cars, blockchain, education, and more.
How much of a problem is school bullying in New York City? The answer depends on who (and how) you ask.
In June 2015, the Environmental Defense Fund joined with the Two Sigma Data Clinic to use oil and gas well inspection data and to create a preliminary predictive model for violations.
One way allocators can improve their inflation forecasts is to analyze it from as many perspectives as possible—just as a data scientist would.
A Two Sigma researcher introduces the Pandas UDFs feature in the upcoming Apache Spark 2.3 release, which substantially improves the performance and usability of user-defined functions (UDFs) in Python.
Designing a system that can extract immediate insights from large amounts of data in real-time requires a special way of thinking. This talk presents a “reactive” approach to designing real-time, responsive, and scalable data applications that can continuously compute analytics on-the-fly. It also highlights a case study as an example of reactive design in action.
The authors present TRIÈST, a suite of one-pass streaming algorithms to compute unbiased, low-variance, high-quality approximations of the global and local number of triangles in a fully-dynamic graph represented as an adversarial stream of edge insertions and deletions.
Apache Arrow-based interconnection between various big data tools (SQL, UDFs, machine learning, big data frameworks, etc.) enables you to use them together seamlessly and efficiently,
An overview of Rademacher Averages, a fundamental concept from statistical learning theory that can be used to derive uniform sample-dependent bounds to the deviation of samples averages from their expectations.