How much of a problem is school bullying in New York City? The answer depends on who (and how) you ask.
In June 2015, the Environmental Defense Fund joined with the Two Sigma Data Clinic to use oil and gas well inspection data and to create a preliminary predictive model for violations.
One way allocators can improve their inflation forecasts is to analyze it from as many perspectives as possible—just as a data scientist would.
A Two Sigma researcher introduces the Pandas UDFs feature in the upcoming Apache Spark 2.3 release, which substantially improves the performance and usability of user-defined functions (UDFs) in Python.
The authors present TRIÈST, a suite of one-pass streaming algorithms to compute unbiased, low-variance, high-quality approximations of the global and local number of triangles in a fully-dynamic graph represented as an adversarial stream of edge insertions and deletions.
Apache Arrow-based interconnection between various big data tools (SQL, UDFs, machine learning, big data frameworks, etc.) enables you to use them together seamlessly and efficiently,
An overview of Rademacher Averages, a fundamental concept from statistical learning theory that can be used to derive uniform sample-dependent bounds to the deviation of samples averages from their expectations.
Given a large graph, the authors we aim at producing a concise lossy representation (a summary) that can be stored in main memory and used to approximately answer queries about the original graph much faster than by using the exact representation.
Berkeley’s Professor David E. Culler discusses the future of data science, the “Berkeley view” of the field, and the biggest challenges for data scientists today.
The authors present an algorithm to help detect new information and events in a network by computing an optimal probing schedule that minimizes the average novelty of undetected items.