25
1/23 Amiya K. Maji, Subrata Mitra, Saurabh Bagchi School of Electrical and Computer Engineering Purdue University West Lafayette, Indiana ICE: An Integrated Configuration Engine for Interference Mitigation in Cloud Services a

1/23 Amiya K. Maji, Subrata Mitra, Saurabh Bagchi School of Electrical and Computer Engineering Purdue University West Lafayette, Indiana ICE: An Integrated

Embed Size (px)

DESCRIPTION

3/23 Performance Interference due to Shared Hardware Resources Other shared resources –Memory bandwidth –Network/IO –Translation Lookaside Buffer (TLB) P1P1 L1 P2P2 Processor Cache L2 Cache (last level) Multi-core Cache Sharing Performance Interference –Performance of one VM suffering due to activity of another co- located VM –One common occurrence is due to shared cache

Citation preview

Page 1: 1/23 Amiya K. Maji, Subrata Mitra, Saurabh Bagchi School of Electrical and Computer Engineering Purdue University West Lafayette, Indiana ICE: An Integrated

1/23

Amiya K. Maji, Subrata Mitra, Saurabh Bagchi

School of Electrical and Computer EngineeringPurdue University

West Lafayette, Indiana

ICE: An Integrated Configuration Engine for Interference Mitigation in Cloud

Services a

Page 2: 1/23 Amiya K. Maji, Subrata Mitra, Saurabh Bagchi School of Electrical and Computer Engineering Purdue University West Lafayette, Indiana ICE: An Integrated

2/23

Introduction

• Long latency and variable latency of cloud services degrade user experience

• Interference between VMs on the same physical machine is a key factor for such latency perturbations

Internet

Page 3: 1/23 Amiya K. Maji, Subrata Mitra, Saurabh Bagchi School of Electrical and Computer Engineering Purdue University West Lafayette, Indiana ICE: An Integrated

3/23

Performance Interference due to Shared Hardware Resources

• Other shared resources– Memory bandwidth– Network/IO– Translation Lookaside

Buffer (TLB)

P1

L1

P2

L1

Processor

Cache

L2 Cache(last level)

Multi-core Cache Sharing

• Performance Interference– Performance of one VM suffering due to activity of another co-

located VM– One common occurrence is due to shared cache

Page 4: 1/23 Amiya K. Maji, Subrata Mitra, Saurabh Bagchi School of Electrical and Computer Engineering Purdue University West Lafayette, Indiana ICE: An Integrated

4/23

Remediation Techniques• Existing solutions

– Better scheduling [Paragon ASPLOS’13, QCloud Eurosys’10]– Live migration [Deepdive ATC’13]– Resource containment [CPI2 Eurosys’13]

• Our prior work– IC2 [Middleware’14]

• Interference-aware Cloud Application Configuration– Advantages

• User level control, no hypervisor modification• 30-40% response time (RT) improvement during interference

– Disadvantages: • High overhead of web server reconfiguration• Cannot improve RT further without degrading throughput

Require changes in hypervisor. Not feasible in public cloud

Page 5: 1/23 Amiya K. Maji, Subrata Mitra, Saurabh Bagchi School of Electrical and Computer Engineering Purdue University West Lafayette, Indiana ICE: An Integrated

5/23

Typical Load-balanced WS Setup

• Latency of a WS VM increases during interference• LB has no knowledge of interference, hence, treats all VMs identically

WS VM

WS VM

WS VM

Load Balancer (LB)

Page 6: 1/23 Amiya K. Maji, Subrata Mitra, Saurabh Bagchi School of Electrical and Computer Engineering Purdue University West Lafayette, Indiana ICE: An Integrated

6/23

ICE: An Integrated Configuration Engine for Interference Mitigation

• Animating Insights– Reducing server load limits the impact of interference– Most large-scale web servers are placed behind load balancers– Use available residual capacity in a WS cluster efficiently

• Objectives– Make reconfiguration (interference

mitigation) faster– Make existing load-balancers

interference-aware– Get better response time during

interference (than IC2)

Page 7: 1/23 Amiya K. Maji, Subrata Mitra, Saurabh Bagchi School of Electrical and Computer Engineering Purdue University West Lafayette, Indiana ICE: An Integrated

7/23

ICE Workflow

• Detect interference in predictive mode by mining patterns of system usage values, e.g., Cycles per instruction (CPI), Cache Miss Rate (CMR)

• Two-level reconfiguration– 1. Update load balancer weight

• Less overhead. More agile.– 2. Update Middleware parameters

• Only for long interferences. Reduces overhead of idle threads.

Page 8: 1/23 Amiya K. Maji, Subrata Mitra, Saurabh Bagchi School of Electrical and Computer Engineering Purdue University West Lafayette, Indiana ICE: An Integrated

8/23

ICE Design

• Key components1. Monitoring Engine (ME)2. Interference Detector (DT)3. LB Config Engine (LBE)4. WS Config Engine (WSE)

Page 9: 1/23 Amiya K. Maji, Subrata Mitra, Saurabh Bagchi School of Electrical and Computer Engineering Purdue University West Lafayette, Indiana ICE: An Integrated

9/23

Interference Detection• We use hardware counters for interference detection

– Faster detection– Hypervisor access not required if counters are virtualized

• Use CPI and CMR from training runs to build a Decision Tree– Decision Tree is easy to interpret– Low detection (classification) overhead

Sample Run with Cloudsuite

Page 10: 1/23 Amiya K. Maji, Subrata Mitra, Saurabh Bagchi School of Electrical and Computer Engineering Purdue University West Lafayette, Indiana ICE: An Integrated

10/23

ICE: Load Balancer Reconfiguration• Objective: Keep WS VM’s CPU utilization below a

threshold Uthres• If predicted CPU above threshold, find a new request rate

such that it goes below threshold• Request rate (RPS) is determined by server weight

value in load balancer configuration• Use the following empirical function for load estimation

Predicted Util

Past Util CPI

RPS

Indicator of Interference

Page 11: 1/23 Amiya K. Maji, Subrata Mitra, Saurabh Bagchi School of Electrical and Computer Engineering Purdue University West Lafayette, Indiana ICE: An Integrated

11/23

ICE: Training Decision Tree and Estimator• Run CloudSuite with various interference intensities, for two

different interference generators DCopy and LLCProbe• Monitor: CPI, CMR, CPU, RPS, RT• Collected data is labeled based on when interference was running• Labeled data used for building Decision Tree• Observations during interference used to build estimator• Multivariate regression using R• Linear model on (OLDCPU, RPS,

CPI) chosen since little added benefit with higher degree

Page 12: 1/23 Amiya K. Maji, Subrata Mitra, Saurabh Bagchi School of Electrical and Computer Engineering Purdue University West Lafayette, Indiana ICE: An Integrated

12/23

ICE: Web Server Reconfiguration• WS reconfiguration is applied only if interference is long

lasting– Similar to heuristic presented in IC2 [Middleware 14]

• During periods of interference optimal Apache/PHP parameters change– MaxClients (MXC) reduces– KeepaliveTimeout (KAT) increases– pm.max_children (PHP) increases

• Under interference, the following actions are needed to improve response time

MXC , KAT , PHP • Value of update determined using empirical functions

Page 13: 1/23 Amiya K. Maji, Subrata Mitra, Saurabh Bagchi School of Electrical and Computer Engineering Purdue University West Lafayette, Indiana ICE: An Integrated

13/23

Reconfiguring Apache and Php

• No-interference– Minimal latency with– MXC=8, PHP=2

• During interference– Minimal latency with– MXC=4, PHP=4

Page 14: 1/23 Amiya K. Maji, Subrata Mitra, Saurabh Bagchi School of Electrical and Computer Engineering Purdue University West Lafayette, Indiana ICE: An Integrated

14/23

Evaluation• Experimental Setup

– CloudSuite (Olio) benchmark with different interferences

– Middlewares: HAProxy+Apache+PHP

– Interferences: LLCProbe, Dcopy– We look at ICE with two load balancer scheduling policies– Weighted Round Robin (WRR or simply RR)

• ICE with WRR shows comparison against a static configuration.– Weighted Least Connection (WLC or simply LC)

• ICE with WLC shows comparison against an out-of-box dynamic load balancer

Page 15: 1/23 Amiya K. Maji, Subrata Mitra, Saurabh Bagchi School of Electrical and Computer Engineering Purdue University West Lafayette, Indiana ICE: An Integrated

15/23

Result 1: Comparing ICE with Existing Solutions

• Baseline is static config, IC2 is Middleware’14 solution (WS reconfiguration only), ICE is two-level reconfiguration (current paper)

• ICE improves response time both in RR and LC• LC (out-of-box) reduces effect of interference significantly, but occasional

spikes remain• ICE reduces frequency and magnitude of these spikes

Round Robin (RR) Load Balancer Least Connection (LC) Load Balancer

400ms 200ms

Page 16: 1/23 Amiya K. Maji, Subrata Mitra, Saurabh Bagchi School of Electrical and Computer Engineering Purdue University West Lafayette, Indiana ICE: An Integrated

16/23

Result 2: Improvement in Response Time and Detection Latency

• ICE improves median response time by upto 94% compared to a static configuration (RR)

• ICE improves median response time by upto 39% compared to a dynamic load balancer (LC)

• Median interference detection latency, i.e., time from onset of interference to first-level reconfiguration at LB is low– 3 sec using ICE; 15-20 sec for IC2

Round Robin (RR) Load Balancer Least Connection (LC) Load Balancer

Page 17: 1/23 Amiya K. Maji, Subrata Mitra, Saurabh Bagchi School of Electrical and Computer Engineering Purdue University West Lafayette, Indiana ICE: An Integrated

17/23

Applying ICE to other Web Services• Can the basic principles of ICE (two-level

reconfiguration) be applied to other web services?– Yes, at least to media streaming services

• Consider Darwin media streaming server– Long lasting sessions – Longer responses (video streams) compared to web requests– Mostly static content vs. dynamic website (Olio)

• Questions:– Can we find a Darwin configuration that can mitigate

interference?– Does changing load-balancer weights improve latency?

Page 18: 1/23 Amiya K. Maji, Subrata Mitra, Saurabh Bagchi School of Electrical and Computer Engineering Purdue University West Lafayette, Indiana ICE: An Integrated

18/23

Experimental Setup• Application

– Darwin streaming server– CloudSuite Media Streaming benchmark

• Middleware– LVS load balancer, Darwin Streaming Server

• Interference– LLCProbe

• Server Performance Metric – Frame delay: Time between when a video frame was expected to be sent by

the server and when it was actually sent– Frame delay should be <= 0 for correct operating points

• Parameters– run_num_threads in Darwin and server_weight in LVS

Page 19: 1/23 Amiya K. Maji, Subrata Mitra, Saurabh Bagchi School of Electrical and Computer Engineering Purdue University West Lafayette, Indiana ICE: An Integrated

19/23

Optimal num_threads with Interference

• 10,000 concurrent clients to a 2-node Darwin cluster• Optimal num_threads changes from 10 to 150 during interference

– This is comparable to PhpMaxChildren config in Php-fpm server• Frame delay improves significantly with different num_threads

during interference

No interference With LLCProbe

Page 20: 1/23 Amiya K. Maji, Subrata Mitra, Saurabh Bagchi School of Electrical and Computer Engineering Purdue University West Lafayette, Indiana ICE: An Integrated

20/23

Improvement in Latency with Different LB Weights

• Consider two num_threads settings– Optimal value for no interference (10)– Optimal value for interference (150)

• Vary load balancer weight for both these settings to see impact on latency

• Reducing LB weights does improve latency in both cases• By reducing load sufficiently (e.g. 70% here) latency is as

good as during no-interference– Similar to our previous observation with Olio

With LLCProbe

Page 21: 1/23 Amiya K. Maji, Subrata Mitra, Saurabh Bagchi School of Electrical and Computer Engineering Purdue University West Lafayette, Indiana ICE: An Integrated

21/23

Concluding Insights• Effect of interference can be mitigated by reducing load on the

affected VM, through a load balancer• We presented ICE for two-level configuration in WS clusters

– First level: Reconfigure load balancer– Second level: Reconfigure the web service (only for longer lasting

interference)• ICE improves median Response Time of a representative web

service by 94% compared to static configuration and 39% compared to a dynamic out-of-box load balancer

• Median interference detection latency is low – 3 seconds• The basic principle of ICE is also applicable to streaming servers• Future work:

– Handling other types of interferences: network, storage, etc. – Finding “useful” configuration parameters automatically

Page 22: 1/23 Amiya K. Maji, Subrata Mitra, Saurabh Bagchi School of Electrical and Computer Engineering Purdue University West Lafayette, Indiana ICE: An Integrated

22/23

Questions

Page 23: 1/23 Amiya K. Maji, Subrata Mitra, Saurabh Bagchi School of Electrical and Computer Engineering Purdue University West Lafayette, Indiana ICE: An Integrated

23/23

Thank You!

Page 24: 1/23 Amiya K. Maji, Subrata Mitra, Saurabh Bagchi School of Electrical and Computer Engineering Purdue University West Lafayette, Indiana ICE: An Integrated

24/23

Page 25: 1/23 Amiya K. Maji, Subrata Mitra, Saurabh Bagchi School of Electrical and Computer Engineering Purdue University West Lafayette, Indiana ICE: An Integrated

25/23

Backup Slides