Author: Shams Imam (Two Sigma)
Presented at: Two Sigma Open Source meetup, 11/23/2017
Abstract: One of the key challenges in developing a service-oriented architecture (SOA) is anticipating traffic patterns and scaling the number of running instances of services to meet demand. In many situations, it’s hard to know how much traffic a service will receive and when that traffic will come. A service may see no requests for several days in a row and then suddenly see thousands of requests per second. If developers underestimate peak traffic, their service can quickly become overwhelmed and unresponsive, and may even crash, resulting in constant human intervention and poor developer productivity. On the other hand, if they provision sufficient capacity upfront, the resources they allocate will be completely wasted when there’s no traffic. In order to allow for better resource utilization, many cluster management platforms provide auto-scaling features. These features tend to auto-scale at the machine/resource level (as opposed to the request level) or by deferring to logic in the application layer. A more optimal approach would be to run services when–and only when–there is traffic. Waiter is a distributed auto-scaler that delivers this optimal type of request-level auto-scaling. It requires no input or handling from applications and is agnostic to underlying cluster managers; it currently uses Mesos, but can easily run on top of Kubernetes or other solutions. Another challenge with SOAs is enabling the evolution of service implementations without breaking downstream customers. On this front, Waiter supports service-versioning for downstream consumers by running multiple, individually-addressable versions of services. It automatically manages service lifecycles and reaps older versions after a period of inactivity. With a variety of unique features, Waiter is a compelling platform for applications across a broad range of industries. Existing web services can run on Waiter without modification as long as they communicate over HTTP and support the transmission of client requests to arbitrary backends. Two Sigma has employed the platform in a variety of critical production contexts for over two years, with use cases rising to hundreds of millions of requests per day.