A Two Sigma quantitative engineer walks through a paper on the history of formulas for computing meta-statistics on any distribution.
At Two Sigma, we look for inspiration wherever we can, and academic papers are some of the richest sources out there. We love papers so much that we're a Platinum Sponsor of the New York chapter of Papers We Love.
In addition to sponsoring, multiple Two Sigma employees have presented some of their favorite papers at Papers We Love meetups. Matt Adereth, one of our quantitative engineers, recently presented at the San Francisco chapter on a paper entitled A Scalable Bootstrap for Massive Data.
A key ingredient to our methodology is using statistics to understand the systems that we build and run, but few people consider the goodness of those statistics. In statistics, we are interested in calculating quantitative estimates of some (often unknown) distribution where we only have a small set of observed data. We use an estimator to calculate this estimate from the data. It is important to understand the quality of this estimator. Estimator Quality Estimators are a form of statistics about statistics that can be used to gain insight into the accuracy and precision of results.
The formulas for computing these statistics involve some complex mathematics that only work for a select few statistics on a select few distributions. However, with the advent of computers, it turned out that there are some simple algorithms that use only basic algebra and can be used to compute them for any statistic on any distribution!
In this talk, Matt walks through the history of the solutions all the way from earliest algorithms from the 1940's to the modern distributed approaches that can run on "big data."