Berkeley’s Professor David E. Culler discusses the future of data science, the “Berkeley view” of the field, and the biggest challenges for data scientists today.
Using ant colonies and beehives as a starting point, this presentation examines a handful of natural and computer systems to illustrate how to cast system-wide problems into solutions at the individual component level, yielding incredibly simple algorithms for incredibly complex collective behaviors.
Cook, Two Sigma’s open-source resource scheduler for compute clusters, uses preemption to achieve low latency and high throughput.
The authors lay out the fundamental concepts behind OCR--a new runtime system designed to meet the needs of extreme-scale computing--and compare OCR performance to that from MPI for two simple benchmarks.
The ABRA suite of algorithms computes and maintains high-quality approximations of the betweenness centrality of all nodes or edges on static and fully dynamic graphs.
TRIÈST is a suite of sampling-based, one-pass algorithms for approximate triangle counting from fully-dynamic edge streams.
The author introduces Huohua, Two Sigma’s implementation of highly optimized time series operations in Spark.
Comparing big graph centrality measures, approximation algorithm quality guarantees, and the trade-offs and scalability behaviors of distributed algorithms.
A Two Sigma quantitative engineer walks through a paper on the history of formulas for computing meta-statistics on any distribution.
Higher-order Attribute Contraction Schemes (HACS), a language for programming compilers, makes it possible to create a full compiler from a single source file.