Sign up to receive the latest Insights posts in your inbox.

Subscribe
  • Data Science

Gradient Sparsification for Communication-Efficient Distributed Optimization

Modern large-scale ML applications require stochastic optimization algorithms to be implemented on distributed computational architectures. A key bottleneck is the communication overhead for exchanging information such as stochastic gradients among different workers. In this paper, to reduce the communication cost, we propose a convex optimization formulation to minimize the coding length of stochastic gradients.

  • Data Science

Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization

The authors suggest a general oracle-based framework that captures different parallel stochastic optimization settings described by a dependency graph, and derive generic lower bounds in terms of this graph, as well as lower bounds for several specific parallel optimization settings. They highlight gaps between lower and upper bounds on the oracle complexity, and cases where the “natural” algorithms are not known to be optimal.

  • Data Science

Sparse PCA from Sparse Linear Regression

Sparse Principal Component Analysis (SPCA) and Sparse Linear Regression (SLR) have a wide range of applications and have attracted attention as canonical examples of statistical problems in high dimension. A variety of algorithms have been proposed for both SPCA and SLR, but an explicit connection between the two had not been made. This paper shows how to efficiently transform a black-box solver for SLR into an algorithm for SPCA.

  • Technology

A Secure Cloud with Minimal Provider Trust

Bolted is a new architecture for a bare metal cloud with the goal of providing security-sensitive customers of a cloud the same level of security and control that they can obtain in their own private data centers.

  • Technology

Engineering with Open Source

The author walks through how to build a metrics system for a high performance data platform, taking a look at some of the important factors to consider when choosing what open source offerings to use.

  • Technology

Sundial: Harmonizing Concurrency Control and Caching in a Distributed OLTP Database Management System

Distributed transactions suffer from poor performance due to two major limiting factors. First, distributed transactions suffer from high latency because each of their accesses to remote data incurs a long network delay. Second, this high latency increases the likelihood of contention among distributed transactions, leading to high abort rates and low performance. The authors present Sundial, an in-memory distributed optimistic concurrency control protocol that addresses these two limitations.

  • Data Science

Learning and Memorization

In the machine learning research community, it is generally believed that there is a tension between memorization and generalization. This paper examines the extent to which this tension exists, by exploring whether it is possible to generalize by memorizing alone.