Berkeley’s Professor David E. Culler discusses the future of data science, the “Berkeley view” of the field, and the biggest challenges for data scientists today.
Cook, Two Sigma’s open-source resource scheduler for compute clusters, uses preemption to achieve low latency and high throughput.
The authors lay out the fundamental concepts behind OCR--a new runtime system designed to meet the needs of extreme-scale computing--and compare OCR performance to that from MPI for two simple benchmarks.
The ABRA suite of algorithms computes and maintains high-quality approximations of the betweenness centrality of all nodes or edges on static and fully dynamic graphs.
TRIÈST is a suite of sampling-based, one-pass algorithms for approximate triangle counting from fully-dynamic edge streams.
The author introduces Huohua, Two Sigma’s implementation of highly optimized time series operations in Spark.
Comparing big graph centrality measures, approximation algorithm quality guarantees, and the trade-offs and scalability behaviors of distributed algorithms.
A Two Sigma quantitative engineer walks through a paper on the history of formulas for computing meta-statistics on any distribution.
Higher-order Attribute Contraction Schemes (HACS), a language for programming compilers, makes it possible to create a full compiler from a single source file.
The author introduces Cook, an advanced fair-sharing, preemptive scheduling backend for Spark.