MiSoSouP is a suite of algorithms for extracting high-quality approximations of the most interesting subgroups, according to different interestingness measures, from a random sample of a transactional dataset.
In the machine learning research community, it is generally believed that there is a tension between memorization and generalization. This paper examines the extent to which this tension exists, by exploring whether it is possible to generalize by memorizing alone.
The authors survey and discuss methods proposed in the literature for estimating the Sharpe ratio; computing confidence intervals around a point estimation of the Sharpe ratio; and performing hypothesis testing on a single Sharpe ratio and on the difference between two Sharpe ratios.
The authors provide a solution for the Efficient Server Audit Problem based on several new techniques, including simultaneous replay and efficient verification of concurrent executions, implemented for PHP web applications.
The Vera Institute of Justice (Vera) partnered with with Two Sigma’s Data Clinic, a volunteer-based program that leverages employees’ data science expertise, to uncover the factors contributing to continued jail growth in rural areas.
The authors introduce a novel context-dependent simplification technique that improves the scalability of string solvers on challenging constraints coming from real-world problems.
The authors present TRIÈST, a suite of one-pass streaming algorithms to compute unbiased, low-variance, high-quality approximations of the global and local number of triangles in a fully-dynamic graph represented as an adversarial stream of edge insertions and deletions.
This presentation discusses each part of the durable storage stack, from the hardware on up, and how usage numbers can take on different meanings at each layer. It covers what's important to know at each layer, and how to think about and talk about concepts like compression, fragmentation, write amplification, and wear leveling. Finally, it examines different ways benchmarketers present data deceptively, and provides some techniques for identifying and cutting through those kinds of misrepresentations.
Apache Arrow-based interconnection between various big data tools (SQL, UDFs, machine learning, big data frameworks, etc.) enables you to use them together seamlessly and efficiently,
The author introduces REST services and details how to design a REST API that is easy and pleasant to use.