14
Open access to the Proceedings of the 2018 USENIX Annual Technical Conference is sponsored by USENIX. Tributary: spot-dancing for elastic services with latency SLOs Aaron Harlap and Andrew Chung, Carnegie Mellon University; Alexey Tumanov, UC Berkeley; Gregory R. Ganger and Phillip B. Gibbons, Carnegie Mellon University https://www.usenix.org/conference/atc18/presentation/harlap This paper is included in the Proceedings of the 2018 USENIX Annual Technical Conference (USENIX ATC ’18). July 11–13, 2018 • Boston, MA, USA ISBN 978-1-939133-02-1

Tributary: spot-dancing for elastic services with latency SLOs · Tributary: spot-dancing for elastic services with latency SLOs ... be true in a homogeneous cluster, which makes

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Tributary: spot-dancing for elastic services with latency SLOs · Tributary: spot-dancing for elastic services with latency SLOs ... be true in a homogeneous cluster, which makes

Open access to the Proceedings of the 2018 USENIX Annual Technical Conference

is sponsored by USENIX.

Tributary: spot-dancing for elastic services with latency SLOs

Aaron Harlap and Andrew Chung, Carnegie Mellon University; Alexey Tumanov, UC Berkeley; Gregory R. Ganger and Phillip B. Gibbons, Carnegie Mellon University

https://www.usenix.org/conference/atc18/presentation/harlap

This paper is included in the Proceedings of the 2018 USENIX Annual Technical Conference (USENIX ATC ’18).

July 11–13, 2018 • Boston, MA, USA

ISBN 978-1-939133-02-1

Page 2: Tributary: spot-dancing for elastic services with latency SLOs · Tributary: spot-dancing for elastic services with latency SLOs ... be true in a homogeneous cluster, which makes

Tributary: spot-dancing for elastic services with latency SLOsAaron Harlap§∗ Andrew Chung§∗ Alexey Tumanov†

Gregory R. Ganger§ Phillip B. Gibbons§

§Carnegie Mellon University †UC Berkeley

AbstractThe Tributary elastic control system embraces the uncer-tain nature of transient cloud resources, such as AWSspot instances, to manage elastic services with latencySLOs more robustly and more cost-effectively. Suchresources are available at lower cost, but with the pro-viso that they can be preempted en masse, making themrisky to rely upon for business-critical services. Tribu-tary creates models of preemption likelihood and exploitsthe partial independence among different resource of-ferings, selecting collections of resource allocations thatsatisfy SLO requirements and adjusting them over time,as client workloads change. Although Tributary’s col-lections are often larger than required in the absence ofpreemptions, they are cheaper because of both lower spotcosts and partial refunds for preempted resources. At thesame time, the often-larger sets allow unexpected work-load bursts to be absorbed without SLO violation. Overa range of web service workloads, we find that Tribu-tary reduces cost for achieving a given SLO by 81–86%compared to traditional scaling on non-preemptible re-sources, and by 47–62% compared to the high-risk ap-proach of the same scaling with spot resources.

1 IntroductionElastic web services have been a cloud computing sta-ple from the beginning, adaptively scaling the numberof machines used over time based on time-varying clientworkloads. Generally, an adaptive scaling policy seeks touse just the number of machines required to achieve itsService Level Objectives (SLOs), which are commonlyfocused on response latency and ensuring that a givenpercentage (e.g., 95%) of requests are responded to inunder a given amount of time [17, 28, 19]. Too manymachines results in unnecessary cost, and too few re-sults in excess customer dissatisfaction. As such, muchresearch and development has focused on doing thiswell [20, 14, 11, 12, 26].

Elastic service scaling schemes generally assume in-dependent and infrequent failures, which is a relativelysafe assumption for high-priority allocations in privateclouds and non-preemptible allocations in public clouds(e.g., on-demand instances in AWS EC2 [3]). This as-

∗Equal contribution

sumption enables scaling schemes to focus on clientworkload and server responsiveness variations in deter-mining changes to the number of machines needed tomeet SLOs.

Modern clouds also offer transient, preemptible re-sources (e.g., EC2 Spot Instances [1]) at a discount of70–80% [6], creating an opportunity for cheaper ser-vice deployments. But, simply using standard scalingschemes fails to address the risks associated with suchresources. Namely, preemptions should be expected tobe more frequent than failures and, more importantly,preemptions often occur in bulk. Akin to co-occurringfailures, bulk preemptions can cause traditional scalingschemes to have sizable gaps in SLO attainment.

This paper describes Tributary, a new elastic controlsystem that exploits transient, preemptible resources toreduce cost and increase robustness to unexpected work-load bursts. Tributary explicitly recognizes the bulkpreemption risk, and it exploits the fact that preemp-tions are often not highly correlated across differentpools of resources in heterogeneous clouds. For ex-ample, in AWS EC2, there is a separate spot marketfor each instance type in each availability zone, and re-searchers have noted that they often move independently:while preemptions within each spot market are corre-lated, across spot markets they are not [16]. To safelyuse preemptible resources, Tributary acquires collectionsof resources drawn from multiple pools, modified as re-source prices change and preemptions occur, while en-deavoring to ensure that no single bulk preemption wouldcause SLO violation. We refer to this dynamic use ofmultiple preemptible resource pools as spot-dancing.

AcquireMgr is Tributary’s component that decides theresource collection’s makeup. It works with any tradi-tional scaling policy that determines (reactively or pre-dictively) how many cores or machines are needed foreach successive period of time, based on client load vari-ation. AcquireMgr decides which instances will providesufficient likelihood of meeting each time period’s tar-get at the lowest expected cost. Its probabilistic algo-rithm combines resource cost and preemption probabilitypredictions for each pool to decide how many resourcesto include from each pool, and at what price to bid forany new resources (relative to the current market price).

USENIX Association 2018 USENIX Annual Technical Conference 1

Page 3: Tributary: spot-dancing for elastic services with latency SLOs · Tributary: spot-dancing for elastic services with latency SLOs ... be true in a homogeneous cluster, which makes

Given that a preemption occurs when a market’s spotprice exceeds the bid price given at resource acquisitiontime, AcquireMgr can affect the preemption probabilityvia the delta between its bid price and the current price,informed by historical pricing trends. In our implemen-tation, which is specialized to AWS EC2, the predictionsuse machine learning (ML) models trained on historicalEC2 Spot Price data. The expected cost of the computa-tion takes into account EC2’s policy of partial refunds forpreempted instances, which often results in AcquireMgrchoosing high-risk instances and achieving even biggersavings than just the discount for preemptibility.

In addition to the expected cost savings, Tributary’sspot-dancing provides a burst tolerance benefit. Anyelastic control scheme has some reaction delay betweenan unexpected burst and any resulting addition of re-sources, which can cause SLO violations. Because Trib-utary’s resource collection is almost always bigger thanthe scaling policy’s most recent target in order to accom-modate bulk preemptions, extra resources are often avail-able to handle unexpected bursts. Of course, traditionalelastic control schemes can also acquire extra resourcesas a buffer against bursts, but only at a cost, whereas theextra resources when using Tributary are a bonus side-effect of AcquireMgr’s robust cost savings scheme.

Results for four real-world web request arrival tracesand real AWS EC2 spot market data demonstrate Tribu-tary’s cost savings and SLO benefits. For each of threepopular scaling policies (one reactive and two predic-tive), Tributary’s exploitation of AWS spot instances re-duces cost by 81–86% compared to traditional scalingwith on-demand instances for achieving a given SLO(e.g., 95% of requests below 1 second). Compared to un-safely using traditional scaling with spot instances (AWSAutoScale [2]) instead of on-demand instances, Tribu-tary reduces cost by 47–62% for achieving a given SLO.Compared to other recent systems’ policies for exploit-ing spot instances to reduce cost [24, 16], Tributary pro-vides higher SLO attainment at significantly lower cost.

This paper makes four primary contributions. First,it describes Tributary, the first resource acquisition sys-tem that takes advantage of preemptible cloud resourcesfor elastic services with latency SLOs. Second, it in-troduces AcquireMgr algorithms for composing resourcecollections of preemptible resources cost-effectively, ex-ploiting the partial refund model of EC2’s spot markets.Third, it introduces a new preemption prediction ap-proach that our experiments with EC2 spot market pricetraces show is significantly more accurate than previouspreemption predictors. Fourth, we show that Tributary’sapproach yields significant cost savings and robustnessbenefits relative to other state-of-the-art approaches.

2 Background and Related WorkElastic services dynamically acquire and release machineresources to adapt to time-varying client load. We distin-guish two aspects of elastic control, the scaling policyand the resource acquisition scheme. The scaling pol-icy determines, at any point in time, how many resourcesthe service needs in order to satisfy a given SLO. Theresource acquisition scheme determines which resourcesshould be allocated and, in some cases, aspects of how(e.g., bid price or priority level). This section discussesAWS EC2 spot instances and resource acquisition strate-gies to put Tributary and its new approach to resourceacquisition into context.

2.1 Preemptible resources in AWS EC2In addition to non-preemptible, or reliable resources,

most cloud infrastructures offer preemptible resources asa way to increase utilization in their datacenters. Pre-emptible resources are made available, on a best-effortbasis, at decreased cost (in for-pay settings) and/or atlower priority (in private settings). This subsection de-scribes preemptible resources in AWS EC2, both to pro-vide a concrete example and because Tributary and mostrelated work specialize to EC2 behavior.

EC2 offers “on-demand instances”, which are reli-able VMs billed at a flat per-second rate. EC2 also of-fers the same VM types as “spot instances”, which arepreemptible but are usually billed at prices significantlylower (70% - 80%) than the corresponding on-demandprice. EC2 may preempt spot instances at any time, thuspresenting users with a trade-off between reliability (on-demand) and cost savings (spot).

There are several properties of the AWS EC2 spot mar-ket behavior that affect customer cost savings and thelikelihood of instance preemption. (1) Each instance typein each availability zone has a unique AWS-controlledspot market associated with it, and AWS’s spot mar-kets are not truly free markets [9]. (2) Price movementsamong spot markets are not always correlated, even forthe same instance type in a given region [23]. (3) Cus-tomers specify a bid in order to acquire a spot instance.The bid is the maximum price a customer is willing topay for an instance in a specific spot market; once a bidis accepted by AWS, it cannot be modified. (4) A cus-tomer is billed the spot market price (not the bid price)for as long as the spot market price for the instance doesnot exceed the bid price or until the customer releasesit voluntarily. (5) As of Oct 2nd, 2017, AWS chargesfor the usage of an EC2 instance up to the second, withone exception: if the spot market price of an instance ex-ceeds the bid price during its first hour, the customer isrefunded fully for its usage. No refund is given if thespot instance is revoked in any subsequent hour. We de-fine the period where preemption makes the instance free

2 2018 USENIX Annual Technical Conference USENIX Association

Page 4: Tributary: spot-dancing for elastic services with latency SLOs · Tributary: spot-dancing for elastic services with latency SLOs ... be true in a homogeneous cluster, which makes

as the preemption window.When using EC2 spot instances, the bidding strategy

plays an important role in both cost and preemption prob-ability. Many bidding strategies for EC2 spot instanceshave been studied [9, 33, 30]. The most popular strategyby far is to bid the on-demand price to minimize the oddsof preemption [23, 21], since AWS charges the marketprice rather than the bid price.

2.2 Cloud Resource Acquisition SchemesGiven a target resource count from a scaling policy, a

resource acquisition scheme decides which resources toacquire based on attributes of resources (e.g., bid priceor priority level). Many elastic control systems assumethat all available resources are equivalent, such as wouldbe true in a homogeneous cluster, which makes the ac-quisition scheme trivial. But, some others address re-source selection and bidding strategy aspects of multi-ple available options. Tributary’s AcquireMgr employsnovel resource acquisition algorithms, and we discuss re-lated work here.

AWS AutoScale [2] is a service provided by AWS thatmaintains the resource footprint according to the targetdetermined by a scaling policy. At initialization time,if using on-demand instances, the user specifies an in-stance type and availability zone. Whenever the scalingtarget changes, AutoScale acquires or releases instancesto reach the new target. If using spot instances, the usercan use a so-called “spot fleet”[4] consisting of multi-ple instance type and availability zone options. In thiscase, the user configures AutoScale to use one of twostrategies. The lowestPrice strategy will always selectcheapest current spot price of the specified options. Thediversified strategy will use an equal number of instancesfrom each option. Tributary bids aggressively and diver-sifies based on predicted preemption rates and observedinter-market correlation, resulting in both higher SLO at-tainment and lower cost than AutoScale.

Kingfisher [26] uses a cost-aware resource acquisitionscheme based on using integer linear programming todetermine a service’s resource footprint among a het-erogeneous set of non-preemptible instances with fixedprices. Tributary also selects from among heterogeneousoptions, but addresses the additional challenges and op-portunities introduced by embracing preemptible tran-sient resources. Several works have explored ways ofselecting and using spot instances. HotSpot [27] is a re-source container that allows an application to suspendand automatically migrate to the most cost-efficient spotinstance. While HotSpot works for single-instance ap-plications, it is not suitable for elastic services since itsmigrations are not coordinated and it does not addressbulk preemptions.

SpotCheck [25] proposes two methods of selectingspot markets to acquire instances in while always bid-

ding at a configurable multiple of the spot instance’s cor-responding on-demand price. The first method is greedycheapest-first, which picks the cheapest spot market. Thesecond method is stability-first, which chooses the mostprice-stable market based on past market price move-ment. SpotCheck relies on VM migration and hot spares(on-demand or otherwise) to address revocations, whichincurs additional cost, while Tributary uses a diverse poolof spot instances to mitigate revocation risk.

BOSS [32] hosts key-value stores on spot instancesby exploiting price differences across pools in differentdata-centers and creating an online algorithm to dynam-ically size pools within a constant bound of optimal-ity. Tributary also constructs its resource footprint fromdifferent pools, within and possibly across data-centers.Whereas BOSS assumes non-changing storage capacityrequirements, Tributary dynamically scales its resourcefootprint to maintain the specified latency SLO whileadapting to changes in client workload.

Wang et al. [31] explore strategies to decide whether,in the face of changing application behavior, it is better toreserve discounted resources over longer periods or leaseresources at normal rates on a shorter term basis. Theirsolution combines on-demand and “reserved” (long termrental at discount price) instances, neither of which areever preempted by Amazon.

ExoSphere [24] is a virtual cluster framework forspot instances. Its instance acquisition scheme is basedon market portfolio theory, relying on a specified riskaverseness parameter (α). ExoSphere formulates the re-turn of a spot instance acquisition as the difference be-tween the on-demand cost and the expected cost basedon past spot market prices. It then tries to maximize thereturn of a set of instance allocations with respect to risk,considering market correlations and α , determining thefraction of desired resources to allocate in each spot mar-ket being considered. For a given virtual cluster size,ExoSphere will acquire the corresponding number of in-stances from each market at the on-demand price. Unsur-prisingly, since it was created for a different usage model,ExoSphere’s scheme is not a great fit for elastic serviceswith latency SLOs. We implement ExoSphere’s schemeand show in Section 5.6 that Tributary achieves lowercost, because it bids aggressively (resulting in more pre-emptions), and higher SLO attainment, because it explic-itly predicts preemptions and selects resource sets basedon sufficient tolerance of bulk preemptions.

Proteus [16] is an elastic ML system that combineson-demand resources with aggressive bidding of spot re-sources to complete batch ML training jobs faster andcheaper. Rather than bidding the on-demand price, it bidsclose to market price and aggressively selects spot mar-kets and bid prices that it predicts will result in preemp-tion, in hopes of getting many partial hours of free re-

USENIX Association 2018 USENIX Annual Technical Conference 3

Page 5: Tributary: spot-dancing for elastic services with latency SLOs · Tributary: spot-dancing for elastic services with latency SLOs ... be true in a homogeneous cluster, which makes

sources. The few on-demand resources are used to main-tain a copy of the dynamic state as spot instances comeand go, and acquisitions are made and used to scale theparallel computation whenever they would reduce the av-erage cost per unit work. Although Tributary uses someof the same mindset (aggressive use of preemptible re-sources), elastic services with latency SLOs are differentthan batch processing jobs; elastic services have a tar-get resource quantity for each point in time, and havingfewer usually leads to SLO violations, while having moreoften provides no benefit. Unsurprisingly, therefore, wefind that Proteus’s scheme is not a great fit for such ser-vices. We implement Proteus’s acquisition scheme andshow in Section 5.6 that Tributary achieves much higherSLO attainment, because it understands the resource tar-get and explicitly uses diversity to mitigate bulk preemp-tion effects. Tributary also uses a new and much moreaccurate preemption predictor.

3 Elastic Control in TributaryAcquireMgr is Tributary’s resource acquisition compo-nent, and its approach differentiates Tributary from pre-vious elastic control systems. It is coupled with a scal-ing policy, any of many popular options, which providesthe time-varying resource quantity target based on clientload. AcquireMgr uses ML models to predict the pre-emption probability of resources and exploits the rela-tive independence of AWS spot markets to account forpotential bulk preemptions by acquiring a diverse mix ofpreemptible resources collectively expected to satisfy theuser-specified latency SLO. This section describes howAcquireMgr composes the resource mix while targetingminimal cost.

Resource Acquisition. AcquireMgr interacts withAWS to request and acquire resources. To do so, Ac-quireMgr builds sets of request vectors. Each requestvector specifies the instance type, availability zone, bidprice, and number of instances to acquire. We call thisan allocation request. An allocation is defined as a setof instances of the same type acquired at the same timeand price. AcquireMgr’s total footprint, denoted with thevariable A, is a set of such allocations. Resource acqui-sition decisions are made under four conditions: (1) aperiodic (one-minute) clock event fires, (2) an allocationreaches the end of its preemption window, (3) the scalingpolicy specifies a change in resource requirement, and/or(4) a preemption occurs. We term these conditions deci-sion points.

AcquireMgr abstracts away the resource type whichis being optimized for. For the workloads describedin this paper, virtual CPUs (VCPUs) are the bottleneckresource; however, it is possible to optimize for mem-ory, network bandwidth, or other resource types instead.A service using Tributary provides its resource scaling

characteristics to AcquireMgr in the form of a utilityfunction υ(). This utility function maps the number ofresources to the percentage of requests expected to meetthe target latency, given the load on the web service.The shape of a utility function is service-specific and de-pends on how the service scales, for the expected load,with respect to the number of resources. In the simplestcase where the web service is embarrassingly parallel,the utility function is linear with respect to the numberof resources offered until 100% of the requests are ex-pected to be satisfied, at which point the function turnsinto a horizontal line. As a concrete example, if an em-barrassingly parallel service specifies that 100 instancesare required to handle 10000 requests per second with-out any of the requests missing the target latency, a linearutility function will assume that 50 instances will allowthe system to meet the target latency on 50% of the re-quests. Tributary allows applications to customize theutility function so as to accommodate the resource re-quirements of applications with various scaling charac-teristics.

In addition to providing υ(), the service also providesthe application’s target SLO in terms of a percentage ofrequests required to meet the target latency. By expos-ing the target SLO as a customizable input, Tributary al-lows the application to control the Cost-SLO tradeoff.Upon receiving this information, AcquireMgr acquiresenough resources to meet SLO in expectation while op-timizing for expected cost. In deciding which resourcesto acquire, AcquireMgr uses the prediction models de-scribed in Sec. 3.1 to predict the probability that eachallocation would be preempted. Using these predictions,AcquireMgr can compute the expected cost and the ex-pected utility of a set of allocations (Sec. 3.2). Ac-quireMgr greedily acquires allocations until the expectedutility is greater than or equal to the SLO percentage re-quirement (Sec. 3.3).

3.1 Prediction ModelsWhen acquiring spot instances on AWS, there are

three configurable parameters that affect preemptionprobability: instance type, availability zone and bidprice. This section describes the models used by Ac-quireMgr to predict allocation preemption probabilities.

Previous work [16] proposed taking the historical me-dian probability of preemption based on the instancetype, availability zone and bid price. This approach doesnot consider time of day, day of week, price fluctuationsand several other factors that affect preemption proba-bilities. AcquireMgr trains ML models considering suchfeatures to predict resource reliability.

Training Data and Feature Engineering. The pre-diction models are trained ahead of time with data de-rived from AWS spot market price histories. Each sam-ple in the training dataset is a hypothetical bid, and the

4 2018 USENIX Annual Technical Conference USENIX Association

Page 6: Tributary: spot-dancing for elastic services with latency SLOs · Tributary: spot-dancing for elastic services with latency SLOs ... be true in a homogeneous cluster, which makes

target variable, preempted, of our model is whether ornot an instance acquired with the hypothetical bid is pre-empted before the end of its preemption window (1 hr).We use the following method to generate our data set:For each instance and bid delta (bid price above the mar-ket price with range [0.00001,0.2]) we generate a set ofhypothetical bids with the bid starting at a random pointin the spot market history. For each bid, we look forwardin the spot market price history. If the market price ofthe instance rises above the bid price at any point withinthe hour, we mark the sample as preempted. For eachhistorical bid, we also record the ten prices immediatelyprior to the random starting point and their time-stamps.

To increase prediction accuracy, AcquireMgr engi-neers features from AWS spot market price histories.Our engineered features include: (1) Spot market price;(2) Average spot market price; (3) Bid delta; (4) Fre-quency of spot market price changing within past hour;(5) Magnitude of spot market price fluctuations withinpast two, ten, and thirty minutes; (6) Day of the week;(7) Time of day; (8) Whether the time of day falls withinworking hours (separate feature for all three time zones).These features allow AcquireMgr to construct a morecomplex prediction model, leading to a higher predictionaccuracy (Sec. 5.7).

Model Design. To capture the temporal nature of theEC2 spot market, AcquireMgr uses a Long Short-TermMemory Recurrent Neural Network (LSTM RNN) to pre-dict instance preemptions. The LSTM RNN is a popu-lar model for workloads where the ordering of trainingexamples is important to prediction accuracy [29]. Ex-amples of such workloads include language modeling,machine translation, and stock market prediction. Un-like feed forward neural networks, LSTM models takeprevious inputs into account when classifying input data.Modeling the EC2 spot market as a sequence of eventssignificantly improves prediction accuracy (Sec. 5.7).The output of the model is the probability of the resourcebeing preempted within the hour.

3.2 AcquireMgrTo make decisions about which resources to acquire

or release, AcquireMgr computes the expected cost andexpected utility of the set of instances it is consideringat each decision point. Calculations of the expected val-ues are based on probabilities of preemption computedby AcquireMgr’s trained LSTM model. This section de-scribes how AcquireMgr computes these values.

Definitions. To aid in discussion, we first define thenotion of a resource pool. Each instance type in eachavailability zone forms its own resource pool—in thecontext of the EC2 spot instances, each such resourcepool has its own spot market. Given a set of allocationsA, where A is formulated as a jagged array, let Ai be de-fined as the ith entry of A corresponding to an array of

A Set of allocations as jagged arrayAi Sorted array of allocations from resource pool iai, j Set of instances allocated from resource pool iβi, j Probability that allocation ai, j is preemptedti, j Time left in the preemption window for ai, j

ki, j Number of instances in allocation ai, j

Pi, j Market price of allocation ai, j

pi, j Bid price of allocation ai, j

size(y) Size of the major dimension of array yresc(y) Counts the total number of resources in yλi Regularization term for diversityP(R = r) Probability that r resources remain in Aυ(r) The utility of having r resources remain in AVA The expected utility of a set of allocations ACA Expected cost of a set of allocations ($)

Table 1: Summary of parameters used by AcquireMgr.allocations from resource pool i sorted by bid price in as-

cending order. We define allocation ai, j as an allocationfrom resource pool i (i.e., ai, j ∈ Ai) with the jth lowestbid in that resource pool. We further denote pi, j as thebid price of allocation ai, j, βi, j as the probability of pre-emption of allocation ai, j, and ti, j as the time remainingin the preemption window for allocation ai, j. Note thatpi, j ≥ pi, j−1, which also implies βi, j−1 ≥ βi, j. Finally,we define a size(A) function that returns the size of A’smajor dimension. See Table 1 for symbol reference.

Expected Cost. The total expected cost for a givenfootprint A is calculated as the sum over the expectedcost of individual allocations CA [ai, j]:

CA =size(A)

∑i=1

size(Ai)

∑j=1

CA[ai, j] (1)

AcquireMgr calculates the expected cost of an alloca-tion by considering the probability of preemption withinthe preemption window βi, j for a given allocation ai, j ata given bid delta. There are exactly two possibilities: anallocation will either be preempted with probability βi, jor it will reach the end of its preemption window in theremaining ti, j minutes with probability 1−βi, j, in whichcase we would voluntarily release the allocation. The ex-pected cost can then be written down as:

CA[ai, j] = (1−βi, j)∗Pi, j ∗ ki, j ∗ ti, j +βi, j ∗0∗ ki, j ∗ ti, j (2)

where ki, j is the number of instances in the allocation.and Pi, j is the market price for instance of type i at thetime of acquisition.

Expected Utility. In addition to computing expectedcost for a set of allocations, AcquireMgr computes theexpected utility for a set of allocations. The expectedutility is the expected percentage of requests that willmeet the latency target given the set of allocations A. Ex-pected utility takes into account the probability of allo-cation preemptions, providing AcquireMgr with a metric

USENIX Association 2018 USENIX Annual Technical Conference 5

Page 7: Tributary: spot-dancing for elastic services with latency SLOs · Tributary: spot-dancing for elastic services with latency SLOs ... be true in a homogeneous cluster, which makes

for quantifying the expected contribution that each allo-cation should make to meet the resource target. The ex-pected utility VA of the set of allocations A is calculatedas follows:

VA =resc(A)

∑r=0

P(R = r)∗υ(r) (3)

where P(R) is the probability mass function of the dis-crete random variable R that denotes the number of re-sources not preempted within the next hour, υ is the util-ity function provided by the service, and resc(A) is thefunction that reports the number of resources in a set ofallocations A. resc(A) computes the total amount of re-sources in A, while size(A) only computes the size of A’smajor dimension.

Eq. 3 computes the expected utility over the next hourgiven a workload, as though Tributary just bid for all itsallocations. This works, even though there will usuallybe complex overlapping expiration windows across anhour, because it only needs to hold true until recomputedat the next decision point, which is never more than aminute away. To derive P(R), AcquireMgr starts off withthe original set of allocations A and generates all possiblesubsets of A. Each possible subset S ⊆ A, S marks someallocations in A as preempted (∈ S) and the remainingallocations as not preempted (6∈ S). To formalize the no-tion, we define the indicator variable di, j, where di, j = 1if allocation ai, j ∈ S and di, j = 0 otherwise.

To compute the probability of S being the set of pre-empted resources (P(S)), AcquireMgr separates all allo-cations by resource pools, as each resource pool withinAWS has its own spot market. We leverage the follow-ing localizing property. Within each resource pool Ai,the probability of preempting an allocation ai, j is onlydependent on whether the allocation with the next low-est bid price, ai, j−1, in the same resource pool is pre-empted. Note that P(ai,1) = βi,1 for allocation ai,1 for allresource pools i. Consider two allocations ai, j,ai, j−1 ∈ Afrom resource pool Ai. We observe the following prop-erties: (1) ai, j cannot be preempted unless ai, j−1 is pre-empted, (2) the probability that both ai, j and ai, j−1 arepreempted is the probability that ai, j is preempted, and(3) the probability that ai, j is preempted without ai, j−1being preempted is 0. With Bayes’ Rule, we observethat:

P(ai, j|ai, j−1) =P(ai, j ∧ai, j−1)

P(ai, j−1)=

βi, j

βi, j−1. (4)

Thus, for an allocation ai, j given subset S⊆ A,

P(ai, j|ai, j−1) =

{0 if allocation ai, j−1 6∈ S,βi, j/βi, j−1 else.

(5)

Tributary further introduces a regularization term λito encourage bidding in markets with low correlation.Having instances spread across lowly correlated markets

is important for avoiding high-risk footprints. If the re-source footprint has too many instances from correlatedresource pools, Tributary becomes exposed to having toomany resources being lost to a correlated price spike,potentially causing an SLO violation. In order obtainprice correlation across spot markets, we periodicallykeep track of fix-sized moving windows of spot marketsand compute the Pearson correlation between each pairof spot markets. When computing expected utility, Trib-utary increases an allocation in Ai’s probability of pre-emption βi, j by λi:

λi = γ ∗size(A)

∑l=1

ρi,l ∗resc(Ai)+ resc(Al)

2∗ resc(A)(6)

where ρi,l is the Pearson correlation between resourcepools i and l, and γ ∈ R ≥ 0 is the configurable penaltymultiplier. Essentially, we add a weighted penalty to anallocation based on its Pearson correlation scores withthe rest of our resources in different resource pools. Inour experiments, we set γ = 0.01. The regularizationterm leads to Tributary creating a diversified resourcepool, thus reducing the probability that a significant por-tion of the resources are preempted simultaneously. Hav-ing a high probability of maintaining the majority of theresource pool at any point time, allows Tributary to avoidSLO violations with a high probability.

Let’s denote P(S) as the probability of S being the setof resources preempted from A. AcquireMgr computesit by taking the product of the conditional probability ofeach allocation having the outcome specified in S. If theallocation is preempted (di, j = 1) the conditional prob-ability of the allocation being preempted (P(ai, j|ai, j−1))is used, otherwise (di, j = 0) the product uses the condi-tional probability of the allocation not being preempted(1−P(ai, j|ai, j−1)).

P(S) =size(A)

∏i=1

size(Ai)

∏j=1

(di, j ∗P(ai, j|ai, j−1)

+(1−di, j)∗ (1−P(ai, j|ai, j−1))) (7)

Finally, AcquireMgr formulates the probability of r re-sources remaining after preemption P(R = r) (Eq. 3) asthe sum of the probabilities of all sets S where the num-ber of resources not preempted in S equals to r:

P(R = r) = ∑S⊆A,resc(S)=resc(A)−r

P(S) (8)

which it uses to calculate the expected utility of a set ofallocations A (Eq. 3).

Computational tractability. AcquireMgr’s algorithmis exponentially computationally expensive as the num-ber of spot markets considered increases. When con-sidering more markets, it is possible to reduce compu-tational complexity by grouping similar, correlated spot

6 2018 USENIX Annual Technical Conference USENIX Association

Page 8: Tributary: spot-dancing for elastic services with latency SLOs · Tributary: spot-dancing for elastic services with latency SLOs ... be true in a homogeneous cluster, which makes

Eviction

Termination

8 c4.large in us-west-2c

2 c4.2xlarge in us-west-2a

4 c4.xlarge in us-west-2a

4 c4.xlarge in us-west-2c

(a) Legend

Alloc B

Alloc C

Rat

e of

Req

uest

s

Alloc A

Alloc D

Time (min)

Alloc C

6030

Alloc E

Alloc D

(b) Tributary

Alloc 2

Alloc 3

Rat

e of

Req

uest

s

Alloc 1

Time (min)

Alloc 3

6030

Alloc 4

(c) AutoScaleFigure 1: Figures (b) and (c) show how Tributary and AutoScale handle a sample workload respectively. Figure (a) is the legend for(b) and (c), color-coding each allocation. The black dotted lines in (b) and (c) signify the request rates over time. At minute 15, therequest rate unexpectedly spikes and AutoScale experiences “slow” requests until completing integration of additional resourceswith 3. Tributary, meanwhile, had extra resources meant to address preemption risk in C, providing a natural buffer of resourcesthat is able to avoid “slow” requests during the spike. At minute 35, when the request rate decreases, Tributary terminates B, sinceit believes that B has the lowest probability of getting the free partial hour. It does not terminate D since it has a high probabilityof eviction and is likely to be free; it also does not terminate C since it needs to maintain resources. AutoScale, on the other hand,terminates both 2 and 3, incurring partial cost. At minute 52, the request rate increases and Tributary again benefits from the extrabuffer while AutoScale misses its latency SLO. In this example, Tributary has less “slow” requests and achieves lower cost thanAutoScale because AutoScale pays for 3 and for the partial hour for both 1 and 2 while Tributary only pays for A and the partialhour for B since C and D were preempted and incur no cost.

markets, and performing revocation analysis with a rep-resentative market. Although this would potentially de-crease the precision of the preemption analysis, it wouldallow AcquireMgr to further improve performance byconsidering a larger number of markets.

3.3 Scaling OutResource Acquisition. When Tributary starts, the

user specifies a target SLO in terms of percentage of re-quests that respond within a certain latency for Tributaryto target. AcquireMgr uses this target SLO to acquireresources. At each decision point, AcquireMgr’s objec-tive is to acquire resources until the expected utility θAis greater than or equal to the target SLO. If the expectedutility is greater than or equal to the target SLO, no actionis taken; otherwise, AcquireMgr computes the expectedcost (Eq. 2) and utility of the current set of allocations(Eq. 3). AcquireMgr then calculates the missing num-ber of resources (M) required to meet the target SLO andbuilds a set of possible allocations (Λ) that consists ofallocations from different resource pools at different bidprices (from $0.0001 to $0.2 above the current price).For each possible allocation Λi, AcquireMgr records thenew expected utility divided by the new expected costof A∪Λi, choosing the allocation Λchosen that maximizesthis value. AcquireMgr continues to add possible alloca-tions until it achieves the target SLO in expectation.

Buffers of Transient Resources. To accommodatepotential resource preemptions, Tributary inherently ac-quires more than the required amount of resources if anyof its allocations have a preemption probability greaterthan zero, which is always the case with spot instances.The amount of additional resources acquired depends onthe target SLO and the probabilities of allocation pre-

emptions (Eq. 3). While the primary goal of these ad-ditional resources is to account for preemptions, they of-ten have the added benefit handling unexpected increasesin load. Experiments with Tributary show that theseresource buffers both increase the fraction of requestsmeeting latency targets and decrease cost (Sec. 5.3).

3.4 Scaling InAside from preemptions, Tributary also tries to scale

in voluntarily. As described earlier, each allocation isconsidered only for the duration of the preemption win-dow. When an allocation reaches the end of its preemp-tion window, it is terminated and replaced with a newallocation if required. When resource requirements de-crease, Tributary considers terminating allocations for al-locations least likely to be preempted. During this pro-cess Tributary chooses the allocation with the least timeremaining in the hour, computes the expected utility θAwithout this allocation, and if it is greater than the tar-get SLO, Tributary terminates the allocation. Tributarycontinues to try and terminate allocations as long as θAremains greater than the target SLO.

3.5 Example and Future ConsiderationExample. Fig. 1 shows how Tributary and AutoScale

handle a sample workload, including how the extra re-sources Tributary acquires to handle preemption eventscan also handle an unexpected request rate increase andhow aggressive allocation selection can get some re-sources for free due to preemptions.

Future. Tributary lowers cost and meets SLO require-ments by taking advantage of low-cost spot instances anduncorrelated prices across different spot instance mar-kets. Mass adoption of systems like Tributary could

USENIX Association 2018 USENIX Annual Technical Conference 7

Page 9: Tributary: spot-dancing for elastic services with latency SLOs · Tributary: spot-dancing for elastic services with latency SLOs ... be true in a homogeneous cluster, which makes

change these characteristics. While a detailed analysis ofmass adoption’s potential effects on EC2 spot-markets isoutside the scope of this paper, we evaluate the effectsof two potential changes to the spot-market policies inSection 5.5.

4 Tributary ImplementationFigure 2 shows Tributary’s high-level system architec-ture. This section describes the main components, howthey fit together, and how they interact with AWS.

Preemption Prediction Models. The prediction mod-els are trained offline using TensorFlow [8] and deployedusing Tensorflow Serving [7]. A separate model is usedfor each resource pool. To service run time predictionsTributary launches a Prediction Serving Proxy that re-ceives all prediction queries from AcquireMgr, forwardsthem to their respective models, aggregates the results,and returns the predictions to AcquireMgr.

Resource Footprint Management. In Tributary, Ac-quireMgr takes primary responsibility for managing theresource footprint. AcquireMgr acquires instances, ter-minates instances, and monitors AWS for instance pre-emption notifications. AcquireMgr considers modifyingthe resource footprint at every decision point, and it fol-lows the procedure described in Sec. 3.3 when additionalresources are needed. Once AcquireMgr selects a set ofinstances to acquire, it sends instance requests to AWSvia boto.ec2 API calls. AWS responds with a set of spotrequest ids, which corresponds to the EC2 instances al-located to AcquireMgr. Once the instances are in a run-ning state, AcquireMgr sends the instance ids associatedwith the new instances to Resource Manager. Instanceremoval follows a similar procedure.

Scaling Policy. The Scaling Policy component deter-mines dynamic sizing of the resource target. Through asimple event-driven API, users can implement their ownscaling policies that access metrics provided by the Mon-itoring Manager and specify the resource target.

Monitoring Manager (MonMgr). The MonitoringManager orchestrates monitoring of service system re-sources. The Scaling Policy can register for metrics suchas total number of requests and average CPU utilizationof instances. The MonMgr queries requested metrics us-ing AWS CloudWatch each monitoring period and for-wards them to the scaling policy.

Resource Manager (ResMgr). The Resource Man-ager is a proxy for AcquireMgr. Using resource tar-gets provided by the Scaling Policy, the ResMgr gen-erates the utility function used by AcquireMgr to makeresource acquisition decisions.1 The ResMgr also re-ceives instance allocations and termination notices fromAcquireMgr and forwards them to the Service Manager.

1Process of constructing the utility function is described in Sec. 5.2.

Figure 2: Tributary architecture.

5 EvaluationThis section evaluates Tributary’s effectiveness. The re-sults support a number of important findings: (1) Tribu-tary’s exploitation of AWS spot market instances reducescost by 81%–86% compared to on-demand instances andsimultaneously decrease SLO latency misses; (2) Com-pared to standard bidding policies for spot instances,Tributary reduces cost by up to 41% and decreases SLOlatency misses by 31%–65%; (3) Compared to extend-ing those standard policies to use enough extra (buffer)resources to match Tributary’s number of SLO latencymisses, Tributary reduces cost by 47%–62%; (4) Trib-utary outperforms state-of-the-art resource managers inrunning elastic services; (5) Tributary’s preemption pre-diction models improve accuracy significantly, resultingin 37% lower cost than previous prediction approaches.

5.1 Experimental SetupExperimental Platform. We report results for use of

three AWS EC2 spot instance types: c4.large, c4.xlarge,and c4.2xlarge. The results correspond to the us-west-2region, which consists of three availability zones. Us-ing the three instance types in each availability zone, ourexperiments involve nine resource pools.

Workload. The simulated workload uses a real-worldtrace for request arrival times, with each request con-sisting of the derivation of the PBKDF2 [18] key of apassword. The calculation of a PBKDF2 key is CPU-heavy, with no network overhead and minimal memoryoverhead. With the CPU performance being the bottle-neck, the resource requirement can be characterized inrequests-per-second-per-VCPU.

Environment. In the simulation framework, each in-stance is characterized with a number of VCPUs, andthe request processing time is configured to the mea-sured time for one request on an EC2 instance (≈100ms).Each instance server maintains a queue of requests, andwe simulate the queueing effects using the discrete eventsimulation library SimPy [22]. The simulation frame-work takes into account resource start-up time, withnewly acquired instances not able to service requests fortwo hundred seconds following their launch.

SLO and Scaling. The target service latency is set to

8 2018 USENIX Annual Technical Conference USENIX Association

Page 10: Tributary: spot-dancing for elastic services with latency SLOs · Tributary: spot-dancing for elastic services with latency SLOs ... be true in a homogeneous cluster, which makes

0 1000 2000 3000 4000Minutes

125

367

Requ

ests

per

seco

nd

(a) ClarkNet Periodic[10]0 200 400 600 800 1000 1200 1400

Minutes

125

242

Requ

ests

per

seco

nd

(b) WITS Large Variation[15]Figure 3: Traces used in system evaluation.

one second, and we verified on EC2 that a VCPU canhandle roughly 10 requests per second without violat-ing the latency target. So, the requests-per-second-per-VCPU is ten, and the queue size per server instance isten times the number of VCPUs in the instance. Tribu-tary is not overly sensitive to the target latency setting.

Traces. We use four real-world request arrival traceswith differing characteristics. Berkeley is from theBerkeley Home IP proxy service and ClarkNet is fromthe ClarkNet ISP’s HTTP servers [10]. Both exhibit aperiodic, diurnal pattern. We use the first 2000 minutesof these two traces, which covers an entire period. WITSis a sampled trace from the Waikato Internet Traffic Stor-age (WITS) [15]. The trace lasts for roughly a day, fromApril 6th to April 7th of the year 2000. This trace ex-hibits large variation of request rates throughout the day,as can be seen in Fig. 3b. WorldCup98 is the arrival traceof the workload on the 1998 FIFA World Cup HTTPServers [10] on day 75 of the World Cup. All traces arescaled to have an average of 125 requests per second inorder to generate sufficient load for the experiments.

5.2 Scaling Policies EvaluatedWe implement three popular scaling policies: Reac-

tive, Predictive Moving Window Average (MWA), andPredictive Linear Regression (LR) to evaluate our sys-tem. The utility function provided by the service is lin-ear for all three policies. We make this assumption sinceour workload characteristic is embarrassingly parallel —if a workload exhibits different scaling characteristics, adifferent utility function can be employed.

The Reactive Policy scales out immediately when de-mand reported by the MonMgr is greater than what theavailable resources are able to handle. It scales in slowly(only after three minutes of low demand), as recom-mended by Gandhi et al. [12], to prevent premature scale-in in case the demand fluctuates widely in a short periodof time. The MWA Policy maintains a sliding windowof a fixed size, with each window entry consisting of thenumber of requests received in each monitoring period.The policy takes the average of the window entries topredict the number of requests on the next monitoring pe-riod. The policy then adjusts the utility and scaling func-tions according to the predicted number of requests, andreports the updated functions to the ResMgr to scale inexpectation of future requests. The LR Policy also main-tains a sliding window of a fixed size, but rather than us-

ing the average in the window for prediction, the policyperforms linear regression on data points in the windowto estimate the expected number of requests in the nextmonitoring period. Our experiments show that regardlessof the scaling policy used, Tributary beats its competitorsin both meeting the service latency target and cost.

5.3 Improvements with TributaryHere, we evaluate Tributary’s ability to reduce cost

and latency target misses against AutoScale.AWS Autoscale. AWS AutoScale (Sec. 2.2) as of-

fered by Amazon only supports the simplest reactivescaling policies. To provide better comparison betweenapproaches, we implement the AWS AutoScale resourceacquisition algorithm as closely as possible according toits documentation [2] and integrate it with Tributary’sSvcMgr to work with its more powerful scaling policies.From here on, mentions of AutoScale refer to our imple-mentation of AWS AutoScale. AutoScale is the equiv-alent of the AcquireMgr component of Tributary. Thedefault AutoScale algorithm with spot instances bids forthe lowest market-priced spot instance at the on-demandprice upon resource requests by the scaling policy. In ad-dition, AutoScale terminates resources as soon as the re-source requirements are lowered, choosing to terminateresources that are most expensive at the moment.

Methodology and Terminology. To achieve fair com-parisons across a wide range of data points, we performcost analysis with simulations using historical spot mar-ket traces. Using traces allows us to test different ap-proaches on the same period of market data and to geta better picture of the expected behavior of the systemin a shorter amount of time. For each request arrivaltrace (Sec. 5.1) and resource acquisition approach, wepresent the average cost and percentage of “slow” re-quests over trace requests across ten randomly chosenday/time starting points between January 23, 2017 andMarch 23, 2017 in the us-west-2 region. From here on,we define a “slow” request as a request that does notmeet the latency target and the percentage of “slow” re-quests as the percentage of “slow” requests over all re-quests in a single trace. 2

Cost Savings and Service Latency Improvements.Fig. 4 shows the cost savings and percentage of “slow”requests for the ClarkNet trace. The cost savings arenormalized against running Tributary on on-demand re-sources. The results demonstrate that Tributary reducescost and “slow” requests for all three scaling policies.Cost savings are ≈ 85% compared to on-demand re-sources. For the ClarkNet trace, Tributary reduces costby 36%, 24% and 21% compared to to AutoScale forthe Reactive, Predictive-LR and Predictive-MWA scalingpolicies, respectively. Compared to AutoScale, Tributary

2Prediction models were trained on data from 06/06/16 – 01/22/17.

USENIX Association 2018 USENIX Annual Technical Conference 9

Page 11: Tributary: spot-dancing for elastic services with latency SLOs · Tributary: spot-dancing for elastic services with latency SLOs ... be true in a homogeneous cluster, which makes

0

2

4

6

8

10

12

14

0

10

20

30

40

50

AutoScale AutoScale+Buffer Tributary

Slo

w R

eq

ue

st

Pe

rce

nta

ge

Co

st(

%)

No

rm

ali

ze

d t

o O

n-D

em

an

d

Cost

Slow Requests

(a) Reactive0

2

4

6

8

10

12

14

0

10

20

30

40

50

AutoScale AutoScale+Buffer Tributary

Slo

w R

eq

ue

st

Pe

rce

nta

ge

Co

st(

%)

No

rm

ali

ze

d t

o O

n-D

em

an

d

Cost

Slow Requests

(b) Predictive-LR0

2

4

6

8

10

12

14

0

10

20

30

40

50

AutoScale AutoScale+Buffer Tributary

Slo

w R

eq

ue

st

Pe

rce

nta

ge

Co

st(

%)

No

rm

ali

ze

d t

o O

n-D

em

an

d

Cost

Slow Requests

(c) Predictive-MWAFigure 4: Cost savings (red) and percentage of “slow” requests (blue) for the ClarkNet trace.

reduces “slow” requests by 72%, 61% and 64%, respec-tively, for the three scaling policies.

In order to decrease the number “slow” requests, popu-lar scaling polices are often configured to provision moreresources than immediately necessary to handle unex-pected increases in load. It is common to specify theresource buffer as a percentage of the expected resourcerequirement. For example, with a buffer of 50%, 15 re-sources (e.g., VCPUs) would be acquired rather than theprojected 10. AutoScale+Buffer shows the cost of provi-sioning AutoScale with a large enough buffer such thatits number of “slow” requests matches that of Tributary.Tributary reduces cost by 61%, 56% and 57% comparedto AutoScale+Buffer for the three scaling policies.

The cost savings for Tributary on the Berkeley tracerelative to AutoScale are similar to those on the ClarkNettrace, but the reduction in percentage of “slow” requestsincreases. This difference in performance is due to differ-ing characteristics of the two traces — the ClarkNet traceexperiences more minute-to-minute volatility in requestrate compared to the Berkeley trace. We observe similarlevels of cost reductions and reduction in “slow” requestson the WITS and WorldCup98 traces, results for WITSare shown in Tables 2. Compared to AutoScale+Buffer,Tributary decreased costs by 47–62% across all traces.

Scaling Policy Cost Saving “Slow” request ReductionReactive 37% 31%Predictive-LR 33% 50%Predictive-MWA 29% 51%

Table 2: Cost and “slow” request improvements for Tributarycompared to AutoScale for the WITS trace

Attribution of Benefits. Tributary’s superior perfor-mance arises from several factors. Much of the reductionin cost compared to AutoScale is due to Tributary’s abil-ity to get free instance hours. Free instance hours occurwhen an allocation does useful work but is preemptedby AWS before the end of a preemption window. Theuser receives a refund for the partial hour, which meansthat any work done by the allocation in the preemptionwindow comes at no cost to the user. Tributary takesthe probability of getting free instance hours into accountwhen computing the expected cost of allocations (Eq. 1),often acquiring resources that provide higher opportuni-ties for free instance hours.

Another factor in Tributary’s lower cost is its abil-ity to remove allocations that are not likely to be pre-

empted when demand drops. When resource demanddecreases, Tributary terminates instances that are leastlikely to be preempted, thus lowering the expected costof its resource footprint. The reductions in “slow” re-quests arise from the buffer of resources acquired byTributary (Sec. 3.3). When acquiring instances, Ac-quireMgr estimates their probability of preemption. Un-less all allocations have a preemption probability of zero,which never occurs for spot instances, Tributary acquiresmore resources than specified by the scaling policy. Theprimary goal of the additional resources is to ensure that,when Tributary experiences preemption events, it stillhas at least the specified number of resources in expecta-tion. The additional resources also provide a secondarybenefit by handling some or all of unexpected bursts ofrequests that exceed the load expected by the scaling pol-icy. The cost of these additional resources is commonlyoffset by free instance hours; indeed, the extra resourcesare acquired to cope with preemptions.

5.4 Risk MitigationA key feature of Tributary is that it encourages

instance diversification, i.e., acquiring instances frommostly independent resource pools (Sec. 3.2). The de-fault AutoScale policy is the lowest-price policy, whichdoes not take diversification into account when acquiringinstances; instead, it acquires the cheapest instance. Il-lustrated in Fig. 1, Tributary acquires different types ofinstances in different availability zones, while AutoScaleacquires instances of the same type (all red). Diversify-ing across resource pools is important, because each hasan independent spot market, avoiding highly correlatedallocation preemptions within a single instance market.Acquiring too much from a single pool, as often oc-curs with AutoScale, creates a high risk of SLO violationwhen preemption events occur (e.g., if the red allocationin Fig. 1c was preempted prior to minute 35).

In our experiments, we found it to be very rare for mar-ket prices to rise above on-demand prices, meaning thatAutoScale rarely experiences preemption events. How-ever, when examining past EC2 spot market traces andother availability zones, we found it to be significantlymore common for the market price to rise above theon-demand price, thus preempting AutoScale instances.3

3 From 01/23/17–03/20/17, the market price rose above the on-demandprice 0 times for the c4.2xlarge instance type in us-west-2. From11/1/16–01/22/17, it happened 1073 times.

10 2018 USENIX Annual Technical Conference USENIX Association

Page 12: Tributary: spot-dancing for elastic services with latency SLOs · Tributary: spot-dancing for elastic services with latency SLOs ... be true in a homogeneous cluster, which makes

0

5

10

15

20

25

0

10

20

30

40

Exo Small α Exo Large α Tributary

Slo

w R

eq

ue

st

Pe

rce

nta

ge

Co

st(

%)

No

rm

ali

ze

d t

o O

n-D

em

an

d

Cost

Slow Requests

(a) Reactive0

5

10

15

20

25

0

10

20

30

40

Exo Small α Exo Large α Tributary

Slo

w R

eq

ue

st

Pe

rce

nta

ge

Co

st(

%)

No

rm

ali

ze

d t

o O

n-D

em

an

d

Cost

Slow Requests

(b) Predictive-LR0

5

10

15

20

25

0

10

20

30

40

Proteus Proteus+Buffer Tributary

Slo

w R

eq

ue

st

Pe

rce

nta

ge

Co

st(

%)

No

rm

ali

ze

d t

o O

n-D

em

an

d

Cost

Slow Requests

(c) Reactive0

5

10

15

20

25

0

10

20

30

40

Proteus Proteus+Buffer Tributary

Slo

w R

eq

ue

st

Pe

rce

nta

ge

Co

st(

%)

No

rm

ali

ze

d t

o O

n-D

em

an

d

Cost

Slow Requests

(d) Predictive-LRFigure 5: Comparing to ExoSphere and Proteus. Predictive-MWA results not shown but similar.

Since Amazon charges users the market price and notthe bid price, it is possible that Amazon may once againpreempt instances bidding the on-demand price withregularity—a phenomenon we recently observed in theus-east availability zones. Thus, AutoScale’s resourceacquisition approach is riskier for services with latencySLOs on spot machines.

Cost of Diversified AutoScale. In addition to thedefault AutoScale policy which acquires the lowest-priced instance, AWS also offers a diversified AutoScalepolicy that starts instances from a diverse set of re-source pools [4]. Acquiring instances from different spotmarkets reduces preemption risks, but our experimentsshowed that it increases cost by 8%–12% compared tothe lowest-price AutoScale policy. Compared to Tribu-tary, which diversifies across spot markets intelligently,we found that a diversified AutoScale policy cost 68%more to achieve the same number of “slow” requests forthe reactive scaling policy on the ClarkNet trace.

5.5 Pricing Model DiscussionOur experimental results are based on current AWS

EC2 billing policies, as described Section 2.1. This sec-tion discusses how Tributary would function under twopotential changes to the billing model: (1) elimination ofpreemption refunds, (2) institution of a free market.

Elimination of preemption refunds. If Amazoneliminates refunds when the market price exceeds bidprice during the first hours of usage, Tributary wouldlose incentive to bid close to market price. Tributary’smodel would capture this change by setting β in Eq. 2to zero. With higher bids, Tributary would acquire fewerresources because preemption would be less likely. Theamount of resources acquired would still exceed theamount of resources required as they would still havenon-zero preemption probabilities.

Although Tributary extracts significant benefit fromthe refunds, it still outperforms AutoScale without it. Forexample, in a simulation with this billing model modi-fication, Tributary still reduces cost by 31% comparedto AutoScale with sufficient buffer to match numbers of“slow” requests, for the Clarknet trace using the reactivescaling policy. As expected, Tributary continues to meetSLOs with high likelihood, as it continues to diversifyits resource pool and acquire buffers of resources (albeitsmaller ones) to account for preemption events.

Free market behavior. In its current design, the AWSEC2 spot markets do not behave as free markets [9]. Cus-

tomers specify their bid prices for a given resource, butgenerally do not pay that amount. Instead, a customeris billed according to the EC2-determined spot price forthat resource. It is possible, perhaps even likely as thespot market becomes widely popular, that AWS will tran-sition toward a billing policy in which users are chargedtheir bid price, instead of the market price, and pricesmove based on supply and demand rather than unknownseller policies. This change would render the commonlyused strategy of bidding far above the market price (e.g.,bidding the on-demand price) obsolete. Tributary’s be-havior would not change significantly, as it already oftensets bid prices close to market prices and explicitly con-siders revocation risks, and we believe it would thereforeoutperform other approaches by even larger margins.

5.6 Comparing to State of the ArtThis section compares Tributary’s support for elastic

services to two state-of-the-art resource managers de-signed for preemptible instances. Since neither systemwas designed for elastic services with latency SLOs,Tributary unsurprisingly performs significantly better.

Exosphere. We implemented ExoSphere’s allocationstrategy, described in Sec. 2.2, with the following as-sumptions and modifications: (i) The ExoSphere paperdid not specify whether the correlation between mar-kets is recomputed as time moves on. In order to avoidthe need to constantly reconstruct ExoSphere’s resourcefootprint, we assumed static correlation between mar-kets. (ii) As the ExoSphere paper does not provide guide-lines as to how to choose α , we experimented with arange of α from 1 to 109. Higher α instructs ExoSphereto be more risk averse at the expense of higher cost.

Fig. 5 shows the normalized cost and percentage of“slow” requests served for Tributary and for ExoSpherewith small (1) and large (109) values of α . These ex-periments were performed on a further scaled-up versionof the ClarkNet trace (100x of already-scaled version),since ExoSphere was designed for 100s to 1000s of in-stances and performs poorly at a scale of 10s.4 In ourexperiments, we observed that Exosphere with a small α

tends to acquire mainly the cheapest resources, inducinglittle diversity and increasing the number of “slow” re-quests in the event of preemptions. Tributary’s advantagein both cost and SLO attainment results from Tributary’sexploitation of spot instance characteristics (Sec. 5.3).

4At small scales, ExoSphere with low α had no resource diversity.With large α , it acquired too many resources, increasing its cost.

USENIX Association 2018 USENIX Annual Technical Conference 11

Page 13: Tributary: spot-dancing for elastic services with latency SLOs · Tributary: spot-dancing for elastic services with latency SLOs ... be true in a homogeneous cluster, which makes

Proteus. We implemented Proteus’s allocation strat-egy, described in Sec. 2.2, modified to acquire only spotresources (reducing cost with no significant change inSLO attainment). Fig. 5 compares Tributary and Pro-teus for the ClarkNet trace, for two different scaling poli-cies. While Proteus achieves lower cost than Tributary, itexperiences a large increase in ”slow” requests. This in-crease is due to Proteus not diversifying its resource pool,instead only acquiring resources based on reducing aver-age per-core cost. When told by the scaling policy to ac-quire additional resources, similarly to AutoScale buffers(Sec. 5.3), Proteus is unable to match Tributary’s numberof ”slow” requests no matter how large the buffer (and,thus, how high the cost). This is once again due to thelack of diversity in the resources that Proteus acquires.

5.7 Prediction Model EvaluationsThis section evaluates the accuracy of the preemption

prediction models used by Tributary, which are describedin Sec. 3.1. The recent Proteus system [16] used the his-torical median probability of preemption depending onthe instance type, availability zone and the difference be-tween the user bid price and the spot market price of theresource. Tributary improves prediction accuracy by us-ing machine learning inference models trained with his-torical spot market data with engineered features. Fig. 6shows the accuracy and F1 scores for prediction modelsbased on the historical median, a logistic regression clas-sifier, a multilayer perceptron neural network (MLP NN)and a long short term memory recurrent neural network(LSTM RNN). These models were trained on spot mar-ket data from 06/06/16 – 01/22/17 and were evaluated ondata from 01/23/17 – 03/20/17 for instance types c4.large,c4.xlarge and c4.2xlarge in us-west-2.

The output of the prediction models is whether the in-stance specified in a query will be preempted within thepreemption window. Accuracy scores are calculated bythe number of samples classified correctly divided by to-tal number of samples. F1 scores, which account for dataskew, are a good accuracy measurement because the dataset is skewed toward preemptions at lower bid deltas andnon-preemptions at higher bid deltas. The LSTM RNNmodel provides the best accuracy and the best F1 becauseit is able to capture the temporal nature of the AWS spotmarket. LSTM increases accuracy by 11% and the F1score by 27% compared to using the historical median.The MLP NN model performs worse than the historicalmedian model for accuracy, but its F1 score is higherbecause unlike the historical median model, the MLPmodel considers advanced features when predicting pre-emptions as described in Sec. 3.1. The increased accu-racy of the LSTM RNN model translates to Tributary’seffectiveness. When using the LSTM RNN model, Trib-utary runs at ≈37% less cost on the ClarkNet workloadcompared to Tributary using historical medians, because

020406080

100

Pred

ictio

n Ac

cura

cy (%

)

Historical MedianLogistic RegressionMLP NNLSTM RNN

020406080

100

F1 S

core

(%)

Historical MedianLogistic RegressionMLP NNLSTM RNN

Figure 6: Accuracies and F1 scores (accounts for data skew)for predicting preemption of AWS spot instances. The LSTMRNN outperforms prior techniques (blue bar) by 11% on theaccuracy metric and 27% on the F1 score metric.the historical median model overestimates the probabil-ity of preemption, causing Tributary to acquire more re-sources than necessary.

6 ConclusionTributary exploits AWS spot instances to meet latencySLOs for elastic services at lower cost. By predictingpreemption probabilities and acquiring diverse resourcefootprints, Tributary can aggressively use collections ofcheap spot instances to reliably meet SLOs even in theface of bulk preemptions. Our experiments show costsavings of 81–86% relative to using non-preemptibleon-demand instances and 47–62% relative to traditionalhigh-risk use of spot instances.

Tributary exploits AWS properties, such as dynamicspot markets and preemption based thereon. We be-lieve its approach would also work for other clouds offer-ing preemptible resources, if they expose enough infor-mation to predict preemption probabilities, which AWSprovides via the visible spot market prices. Currently,Google Cloud Engine [5] does not expose such a signalfor its preemptible instances. For private clouds, expos-ing preemption logs could provide the historical view,but even better predictions can be enabled by exposingscheduler state.Acknowledgements: We thank our USENIX ATC18 re-viewers and our shepherd, Christopher Stewart, for valu-able suggestions. We thank Henggang Cui for his feed-back. We thank the members and companies of the PDLConsortium (Alibaba, Broadcom, Dell EMC, Facebook,Google, HPE, Hitachi, IBM, Intel, Micron, Microsoft,MongoDB, NetApp, Oracle, Salesforce, Samsung, Sea-gate, Two Sigma, Toshiba, Veritas and Western Digital)for their interest, insights, feedback, and support. Thisresearch is supported in part by Intel as part of the In-tel STC for Visual Cloud Systems (ISTC-VCS), NationalScience Foundation under awards CNS-1042537, CCF-1533858, CNS-1042543 (PRObE [13]), and DARPAGrant FA87501220324. Lastly, one of the co-authors issupported in part by DHS Award HSHQDC-16-3-00083,NSF CISE Expeditions Award CCF-1139158, and giftsfrom Alibaba, Amazon Web Services, Ant Financial,CapitalOne, Ericsson, GE, Google, Huawei, Intel, IBM,Microsoft, Scotiabank, Splunk and VMware.

12 2018 USENIX Annual Technical Conference USENIX Association

Page 14: Tributary: spot-dancing for elastic services with latency SLOs · Tributary: spot-dancing for elastic services with latency SLOs ... be true in a homogeneous cluster, which makes

References[1] Amazon EC2 Spot Instances. https://aws.amazon.com/

ec2/spot.

[2] AWS Autoscale. https://aws.amazon.com/autoscaling/.

[3] AWS EC2. http://aws.amazon.com/ec2/.

[4] AWS EC2 Spot Fleet. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-fleet.html.

[5] Google Compute Engine. https://cloud.google.com/compute/.

[6] Spot Bid Advisor. https://aws.amazon.com/ec2/spot/bid-advisor/.

[7] Tensorflow serving. https://tensorflow.github.io/serving.

[8] ABADI, M., BARHAM, P., CHEN, J., CHEN, Z., DAVIS, A.,DEAN, J., DEVIN, M., GHEMAWAT, S., IRVING, G., ISARD,M., KUDLUR, M., LEVENBERG, J., MONGA, R., MOORE, S.,MURRAY, D. G., STEINER, B., TUCKER, P., VASUDEVAN, V.,WARDEN, P., WICKE, M., YU, Y., AND ZHENG, X. Tensor-flow: A system for large-scale machine learning. In Proceedingsof the 12th USENIX Conference on Operating Systems Designand Implementation (OSDI 16).

[9] AGMON BEN-YEHUDA, O., BEN-YEHUDA, M., SCHUSTER,A., AND TSAFRIR, D. Deconstructing Amazon EC2 spot in-stance pricing. ACM Transactions on Economics and Computa-tion 1, 3 (2013), 16.

[10] DANZIG, P., MOGUL, J., PAXSON, V., AND SCHWARTZ, M.The internet traffic archive. URL: http://ita.ee.lbl.gov/ (2000).

[11] GALANTE, G., AND BONA, L. C. E. D. A survey on cloudcomputing elasticity. In Proceedings of the 2012 IEEE/ACMFifth International Conference on Utility and Cloud Computing(Washington, DC, USA, 2012), UCC ’12, IEEE Computer Soci-ety, pp. 263–270.

[12] GANDHI, A., HARCHOL-BALTER, M., RAGHUNATHAN, R.,AND KOZUCH, M. A. Autoscale: Dynamic, robust capacitymanagement for multi-tier data centers. ACM Trans. Comput.Syst. 30, 4 (Nov. 2012), 14:1–14:26.

[13] GIBSON, G., GRIDER, G., JACOBSON, A., AND LLOYD, W.Probe: A thousand-node experimental cluster for computer sys-tems research. USENIX 38, 3 (2013).

[14] GONG, Z., GU, X., AND WILKES, J. Press: Predictive elasticresource scaling for cloud systems. In 6th IEEE/IFIP Interna-tional Conference on Network and Service Management (CNSM2010) (Niagara Falls, Canada, 2010).

[15] GROUP, W. N. R., ET AL. Wits: Waikato internet traffic storage.URL: http://wand.net.nz/wits/index.php.

[16] HARLAP, A., TUMANOV, A., CHUNG, A., GANGER, G. R.,AND GIBBONS, P. B. Proteus: Agile ML elasticity through tieredreliability in dynamic resource markets. In Proceedings of theTwelfth European Conference on Computer Systems (New York,NY, USA, 2017), EuroSys ’17, ACM, pp. 589–604.

[17] HOXMEIER, J. A., AND DICESARE, C. System response timeand user satisfaction: An experimental study of browser-basedapplications. AMCIS 2000 Proceedings (2000), 347.

[18] KALISKI, B. Pkcs# 5: Password-based cryptography specifica-tion version 2.0.

[19] KOHAVI, R., AND LONGBOTHAM, R. Online experiments:Lessons learned. Computer 40, 9 (2007).

[20] LIU, H., AND WEE, S. Web Server Farm in the Cloud: Per-formance Evaluation and Dynamic Architecture. Springer BerlinHeidelberg, Berlin, Heidelberg, 2009, pp. 369–380.

[21] MARATHE, A., HARRIS, R., LOWENTHAL, D., DE SUPINSKI,B. R., ROUNTREE, B., AND SCHULZ, M. Exploiting redun-dancy for cost-effective, time-constrained execution of HPC ap-plications on Amazon EC2. In Proceedings of the 23rd interna-tional symposium on High-performance parallel and distributedcomputing (2014), ACM, pp. 279–290.

[22] MULLER, K., AND VIGNAUX, T. Simpy: Simulating systems inpython. ONLamp. com Python Devcenter (2003).

[23] SHARMA, P., GUO, T., HE, X., IRWIN, D., AND SHENOY,P. Flint: Batch-interactive data-intensive processing on tran-sient servers. In Proceedings of the 11th European Conferenceon Computer Systems (EuroSys 16) (2016), ACM, p. 6.

[24] SHARMA, P., IRWIN, D., AND SHENOY, P. Portfolio-driven re-source management for transient cloud servers. Proceedings ofthe ACM on Measurement and Analysis of Computing Systems 1,1 (2017), 5.

[25] SHARMA, P., LEE, S., GUO, T., IRWIN, D., AND SHENOY, P.Spotcheck: Designing a derivative iaas cloud on the spot market.In Proceedings of the Tenth European Conference on ComputerSystems (2015), ACM, p. 16.

[26] SHARMA, U., SHENOY, P., SAHU, S., AND SHAIKH, A. Acost-aware elasticity provisioning system for the cloud. In 201131st International Conference on Distributed Computing Systems(June 2011), pp. 559–570.

[27] SHASTRI, S., AND IRWIN, D. Hotspot: Automated server hop-ping in cloud spot markets.

[28] SOUDERS, S. Velocity and the bottom line. In Velocity (WebPerformance and Operations Conference) (2009).

[29] SUNDERMEYER, M., SCHLUTER, R., AND NEY, H. LSTMneural networks for language modeling. In Interspeech (2012),pp. 194–197.

[30] TANG, S., YUAN, J., AND LI, X.-Y. Towards optimal bid-ding strategy for Amazon EC2 cloud spot instance. In Proceed-ings of the 5th IEEE International Conference on Cloud Comput-ing(CLOUD 12) (2012), IEEE, pp. 91–98.

[31] WANG, W., LI, B., AND LIANG, B. To reserve or not to reserve:Optimal online multi-instance acquisition in iaas clouds. In ICAC(2013), pp. 13–22.

[32] XU, Z., STEWART, C., DENG, N., AND WANG, X. Blendingon-demand and spot instances to lower costs for in-memory stor-age. In INFOCOM 2016-The 35th Annual IEEE InternationalConference on Computer Communications, IEEE (2016), IEEE,pp. 1–9.

[33] ZHENG, L., JOE-WONG, C., TAN, C. W., CHIANG, M., ANDWANG, X. How to bid the cloud. In Proceedings of the 2015ACM Conference on Special Interest Group on Data Communi-cation (2015), ACM, pp. 71–84.

USENIX Association 2018 USENIX Annual Technical Conference 13