Upload
duy
View
49
Download
3
Embed Size (px)
DESCRIPTION
Power and Performance Management of Virtualized Computing Environments via Lookahead Control. Dara Kusic 1 , Jeffrey O. Kephart 2 , James E. Hanson 2 , Nagarajan Kandasamy 1 , and Guofei Jiang 3 1- Drexel University, Philadelphia, PA 19104 - PowerPoint PPT Presentation
Citation preview
1/48
Power and Performance Management of Virtualized Computing Environments via
Lookahead Control
Dara Kusic1, Jeffrey O. Kephart2, James E. Hanson2, Nagarajan Kandasamy1, and Guofei Jiang3
1- Drexel University, Philadelphia, PA 191042- IBM T.J. Watson Research Center, Hawthorne, NY 10532
3- NEC Labs America, Princeton, NJ 08540
Presented by Tongping Liu
2/48
OUTLINEMotivation and problem statementDescription of the experimental testbedProblem formulation and controller designPerformance resultsConclusions
3/48
DATA-CENTER ENERGY COSTSServer energy
consumption is growing at 9% per year
Carbon dioxide emissions as percentage of world total – industries
Carbon emissions by countries (Metric tons of CO2 per year)
0.30.6
0.8 1.0
Data centers
Airlines Shipyards Steel plants
170 142 146 178
Data centers
Argentina Nether-lands
MalaysiaMcKinsey & Co. Report: http://uptimeinstitute.org/content/view/168/57
Data centers are projected to surpass the airline industry in CO2 emissions by 2020
4/48
SERVER UTILIZATION IN DATA CENTERS
Server utilization averages about 6%, accounting for idle servers
Average daily server utilization (%)
10
20
30
40
50
60
70
80
90
100
Peak daily server utilization (%)
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 90 100
Up to 30% of servers are idle!
McKinsey & Co. Report: http://uptimeinstitute.org/content/view/168/57
5/48
Performance-isolated platforms, called virtual machines (VMs), allow resources (e.g., CPU, memory) to be shared on a single server
VIRTUALIZATION AS THE ANSWER
TechniqueEfficiency
Impact
• 3-5%
Deploy virtualization for existing and new demand
• 25-30%
Implement free cooling • 0-15%
Introduce greener and more power efficient servers
• 10-20%
Selectively turn off core components to increase remaining unit efficiency
McKinsey & Co. Report: http://uptimeinstitute.org/content/view/168/57
Enables consolidation of online services onto fewer servers
Increases per-server utilization and mitigates “server sprawl”
Enables on-demand computing, a provisioning model where resources are dynamically provisioned as per workload demand
6/48
We address combined power and performance management in a virtualized computing environment
– The problem is posed as one of sequential optimization under uncertainty and solved using limited look-ahead control (LLC)
– The notion of risk is encoded explicitly in the problem formulationSummary of main results
– A server cluster managed using LLC saves 26% in power-consumption costs over a 24 hour period when compared to an uncontrolled system
– Power savings are achieved with very few SLA violations (1.6% of the total number of requests)
PROBLEM STATEMENT
7/48
OUTLINEMotivation and problem statementDescription of the experimental testbedProblem formulation and controller designPerformance resultsConclusions
8/48
THE EXPERIMENTAL TESTBED
Gol
dW
ebS
pher
e
O S
Chronos D em eter
D ispatcher
W orkload A rrivals
Application T ier
Apollo Poseidon
O perational H osts Powered-dow nH osts
Database T ie r
S leep
Sleep
f11(k ) f21(k ) f12(k) f13(k )
LLCContro llable Param eters:
host on/o ff: N (k), VM on/off: n i(k),W ork load fraction: (k), C PU share f(k)
Eros
f2n(k )
Eros
Bacchus
Gol
dW
ebS
pher
e
O S
Gol
dW
ebS
pher
e
O S
Silv
erW
ebS
pher
e
O S
Silv
erW
ebS
pher
e
O S
Gol
dD
B2
O S
Silv
erD
B2
O S
Gol
dD
B2
O S
Silv
erD
B2
O S
Silv
erD
B2
O S
Silv
erW
ebS
pher
e
O S
Gol
dD
B2
O S
f22(k )
)(),( 21 kk
The testbed is a two-tier architecture with front-end application servers and back-end databases
It hosts two online services (Gold and Silver)
Servers are virtualized Performance goals
– Minimize power consumption– Minimize SLA violations
We target the application and the database tiers
)(),( 1}..1{21}..1{1 kk nn
9/48
EXPERIMENTAL SYSTEM Six Dell servers (models 2950 and 1950) comprise the experimental testbed Virtualization of the CPU and memory is enabled by VMware ESX Server 3.0 Virtual machines run SUSE Enterprise Linux Server Edition 10 Control directives use the VMware API, Linux shell commands, and IPMI Silver application is Trade6 only; Gold application is Trade6 + extra CPU load
Host name CPU speed # of CPU cores MemoryApollo 2.3 GHz 8 8 GBBacchus 2.3 GHz 2 8 GBChronos 1.6 GHz 8 4 GBDemeter 1.6 GHz 8 4 GBEros 1.6 GHz 8 4 GBPoseidon 2.3 GHz 8 8 GB
10/48
We assume a session-less workload, i.e., incoming requests are independent of each other
The transaction mixed is fixed to a constant proportion of browse/buy requests
CHARACTERISTICS OF THE INCOMING WORKLOAD
0 100 200 300 400 5000
0.5
1
1.5
2
2.5
3x 10
4
Time instance in 150 second increments
Num
ber o
f arri
vals
Arrivals per Time Instance, Workload 1
Silver
Gold
The workload to the computing system is time varying and shows significant variability over short time periods
11/48
APPLICATION ENVIRONMENTOnline services are enabled by enterprise applications
Application server
Database
Web Clients Trade Action
Trade Servlets
Trade Services
DB2 Database
Trade Server Pages
WebSphere Application Server
Trade6 is an example–It is transaction-based stock trading application from IBM
–It can be hosted across one or more servers in a multi-tier architecture
12/48
OUTLINEMotivation and problem statementDescription of the experimental testbedProblem formulation and controller designPerformance resultsConclusions
13/48
The power/performance management problem is posed as a dynamic resource provisioning problem under dynamic operating constraints
Objectives– Maximize the profit generated by the system (i.e., minimize SLA
violations and the power consumption cost)Decisions to be optimized
– Number of servers to turn on or off– Number of VMs to provision to each service – The CPU share given to each VM – Distribute incoming workload to different servers
PROBLEM FORMULATION
14/48
))](())(())(),(([Maximizeu
kuSkuOkukxRk
PROBLEM FORMULATION (Contd.)
0
G old S LA
S ilver SLA
Response Time, ms
Rev
enue
, dol
lars
7e -5
3e -5
-3e -5
0 100 200 300
A stepwise pricing SLA for the online services
5e -5
1e -5
-1e -5
400
Refund
Reward
Dollars generated (Revenue)–Obtained as per a (nonlinear)
reward-refund curve specified by the SLA
Reward is defined by SLA for each service class
Violation of SLA results in a refund to client
15/48
))](())(())(),(([Maximizeu
kuSkuOkukxRk
Key characteristics of the control problem– Some control actions have (long) dead times; e.g., switching on a server,
instantiating VMs, migrating VMs– Decisions must be optimized over a discrete domain– Optimization must be performed quickly, given the dynamics of input
We use a limited look-ahead control (LLC) concept
PROBLEM FORMULATION (Contd.)
Power consumption cost of operating servers
Switching costs– Opportunity cost lost due
to the unavailability of servers/VMs involved in provisioning decisions
– Transient power consumption costs
16/48
THE LLC FRAMEWORK
LLC is same as model predictive control, but in a discrete domain and quickly
Advantages– Use predictions to improve control performance– Robust (iterative feedback) even in dynamic operating conditions– Inherent compensation for dead times– Multi-objective and non-linear optimization in the discrete domain under
explicit constraints
System
Predictivefilter
Systemm odel
System O ptim izer
W orkloadforecast
C ontro linput to
evaluate
Estim atedstate
W orkloadarriva l
O bservedstate
Contro linput
17/48
THE LLC FRAMEWORK (Contd.)
],1[ :horizon Prediction
hkk
k+4k+3k+2 Use a system model to estimate future system states over a prediction horizon
k+1
Obtain an “optimal” sequence of control inputs
)1(ˆ kx
)2(ˆ kx )(ˆ hkx )3(ˆ kx
Apply the first control input in the sequence at time k +1; discard the rest
x(k)
18/48
System
Predictivefilter
Systemm odel
System O ptim izer
W orkloadforecast
Contro linput to
eva luate
Estim atedsta te
W orkloadarriva l
O bservedstate
Contro linput
WORKLOAD ESTIMATION USING A PREDICTIVE FILTER
0 50 100 150 200 250 300 350 400 450 5000
0.5
1
1.5
2
2.5
3x 10
4 Kalman Filter Workload Estimates, Workload 1
Num
ber o
f arri
vals
Time instance in 150 second increments
Silver
Gold
Kalman estimate: dotted lineActual value: solid line
A Kalman filter is used to estimate the workload over the prediction horizon
Prediction error is about 8%
Training phase
19/48
CONSTRUCTING THE SYSTEM MODEL
20 25 30 350
100
200
300
400
500
600
700
800Measured Average Response Time for the Gold Application
Workload in arrivals per second
Res
pons
e tim
e in
ms
3 GHz CPU Share VM
4.5 GHz CPU Share VM
6GHz CPU Share VM
Gold SLA
System model will base on observed state, control input and estimated workload to create new state.
The behavior of each application is captured using simulation-based learning and stored in an approximation structure (e.g., lookup table, neural network) – OFFLINE mode
System
Predictivefilter
Systemm odel
System O ptim izer
W orkloadforecast
C ontro linput to
eva luate
Estim atedsta te
W orkloadarrival
O bservedstate
C ontro linput
20/48
20 25 30 350
100
200
300
400
500
600
700
800Measured Average Response Time for the Gold Application
Workload in arrivals per second
Res
pons
e tim
e in
ms
3 GHz CPU Share VM
4.5 GHz CPU Share VM
6GHz CPU Share VM
Gold SLA
CONSTRUCTING THE SYSTEM MODEL
Example 1: Given a 3 GHz CPU share and 1 GB of memory, how many requests can a Gold VM handle before incurring SLA violations?
Average response time is below the limit doesn’t mean that no violations.
System
Predictivefilter
Systemm odel
System O ptim izer
W orkloadforecast
C ontro linput to
eva luate
Estim atedsta te
W orkloadarrival
O bservedstate
C ontro linput
21/48
20 25 30 350
100
200
300
400
500
600
700
800Measured Average Response Time for the Gold Application
Workload in arrivals per second
Res
pons
e tim
e in
ms
3 GHz CPU Share VM
4.5 GHz CPU Share VM
6GHz CPU Share VM
Gold SLA
CONSTRUCTING THE SYSTEM MODEL
Example 2: Given a 6 GHz CPU share and 1 GB of memory, how many requests can a Gold VM handle before incurring SLA violations?
System
Predictivefilter
Systemm odel
System O ptim izer
W orkloadforecast
C ontro linput to
eva luate
Estim atedsta te
W orkloadarrival
O bservedstate
C ontro linput
22/48
Observation – non-linear behavior
20 25 30 350
100
200
300
400
500
600
700
800Measured Average Response Time for the Gold Application
Workload in arrivals per second
Res
pons
e tim
e in
ms
3 GHz CPU Share VM
4.5 GHz CPU Share VM
6GHz CPU Share VM
Gold SLA
3G: 22 requests, 6G: 29 requests, why we can’t achieve 2 speedup if CPU share is 2 times??
Memory or IO is not considered????
23/48
CONSTRUCTING THE SYSTEM MODEL (Contd.)
Power = current * voltage Two observations:
– Power consumption of boot time is larger than idle state. – Power consumption having VMs is not greatly larger than idle state.
1 2 3 4 5 6 7 8 9 100
50
100
150
200
250
300
350
Host state
Pow
er c
onsu
med
in W
atts
Power Consumption per Host Machine State
Host states: 1 - Standby 2 - Boot, 1st 20 sec. 3 - Boot, remaining 2 min. 35 sec. 4 - Idle, 0 vms 5 - Boot 1st vm 6 - Idle, 1vm 7 - Boot 2nd vm 8 - Idle, 2 vms 9 - Workload on 1 vm 10 - Workload on 2 vms
Dell PowerEdge 1950
Dell PowerEdge 2950
System
Predictivefilter
Systemm odel
System O ptim izer
W orkloadforecast
C ontro linput to
eva luate
Estim atedsta te
W orkloadarrival
O bservedstate
C ontro linput
24/48
CONSTRUCTING THE SYSTEM MODEL (Contd.) - Power consumptions
Power consumption is closely related to CPU usage.
25/48
Does increase of CPU utilization increase Computer Power consumption?
The more utilization of the CPU, the more signals generated and processed by it.
Consequently, the more utilization of the CPU, the greater the energy requirement.
Power = energy x time. So we can than conclude that the greater the
CPU utilization the greater the power consumption.
26/48
Key Observations (1) Idle machine consumes 70% or more
power of full utilization Conclusion (1): Power down machine to achieve
maximum power savings.(2) Intensity of the workload at VMs doesnot
affect power consumption and cpu utilization. Conclusion (2): Only the number of VMs can
affect power consumption.
(3) Power consumed by a server is a function of instantiated VMs on it.
27/48
EXPERIMENTAL SYSTEM Six Dell servers (models 2950 and 1950) comprise the experimental testbed Virtualization of the CPU and memory is enabled by VMware ESX Server 3.0 Virtual machines run SUSE Enterprise Linux Server Edition 10 Control directives use the VMware API, Linux shell commands, and IPMI Silver application is Trade6 only; Gold application is Trade6 + extra CPU load
Host name CPU speed # of CPU cores MemoryApollo 2.3 GHz 8 8 GBBacchus 2.3 GHz 8 8 GBChronos 1.6 GHz 8 4 GBDemeter 1.6 GHz 8 4 GBEros 1.6 GHz 8 4 GBPoseidon 2.3 GHz 8 8 GB
28/48
CPU scheduling modework-conservative mode (WC-mode): in order to keep the
server resources well utilized– Under WC-mode, the shares are merely guarantees, and CPU is idle if and
only if there is no runnable work.Non-work-conservative:
– With NWC-mode, the shares are caps, i.e., each client owns its fraction of the CPU.
– It means that if one VM is assigned to 3G HZ cpu, this VM cannot use more than this even if the system is 10G HZ and no other VM at all.
Assumption: – Esx server is worked on non-work-conservative mode– Cpu assignment is not larger than maximum limit of hardware.
29/48
System
Predictivefilter
Systemm odel
System O ptim izer
W orkloadforecast
C ontro linput to
eva luate
Estim atedsta te
W orkloadarrival
O bservedstate
C ontro linput
DEVELOPING THE OPTIMIZER Issue 1: Risk-aware control
– Due to the energy and opportunity costs incurred when switching hosts and VMs on/off, excessive switching caused by workload variability may actually reduce profits
– We need to encode a notion of risk in the cost function
Cost that accumulates during the time a server is being turned on but is unavailable to perform any useful service
30/48
Environment-input estimates will have prediction errors
We encode a notion of risk in the optimization problem– Generate a set of expected next states for lots of the predicted environment inputs
RISK-AWARE CONTROL
)()(ˆ)(ˆ)()(ˆ jjjjj iiiii
Construct an uncertainty bound for environment input of interest:
Averaged past observed error between actual and forecasted arrival rate
Estimated environment
input
31/48
RISK-AWARE CONTROL (Contd.) A utility function encodes risk into the objective function
hk
kj iiiiiujujXU
1
2
}{)(),(ˆmax
Maximize utility over horizon and client classes
Utility model with tunable risk-preference parameter, β
Apply a mean-square variance model of utility
Formulate a utility maximization problem
β = 0 : risk-neutral
β > 0 : risk-averseβ < 0 : risk-seeking
Tunable risk-preference parameter, β
Uncertainty as variance
A > 2*Mean
ijXjXjXAR iiii
2)(ˆmean)(ˆvar)(ˆmeanU
32/48
System
Predictivefilter
Systemm odel
System O ptim izer
W orkloadforecast
Contro linput to
eva luate
Estim atedsta te
W ork loadarrival
O bservedstate
Contro linput
DEVELOPING THE OPTIMIZER (Contd.)
We use a control hierarchy to reduce execution-time overhead– An L0 controller decides the CPU share to assign to VMs
– An L1 controller decides the number of VMs for each service and the number of servers to keep powered on
– The average execution time of the L1 controller is about 10 seconds
Issue 2: Execution-time overhead of the controller
– “Curse of dimensionality” - Problem will show an exponential increase in worst-case complexity with more control options and longer prediction horizons
33/48
OUTLINEMotivation and problem statementDescription of the experimental testbedProblem formulation and controller designPerformance resultsConclusions
34/48
EXPERIMENTAL SYSTEM Six Dell servers (models 2950 and 1950) comprise the experimental testbed Virtualization of the CPU and memory is enabled by VMware ESX Server 3.0 Virtual machines run SUSE Enterprise Linux Server Edition 10 Control directives use the VMware API, Linux shell commands, and IPMI Silver application is Trade6 only; Gold application is Trade6 + extra CPU load
Host name CPU speed # of CPU cores MemoryApollo 2.3 GHz 8 8 GBBacchus 2.3 GHz 8 8 GBChronos 1.6 GHz 8 4 GBDemeter 1.6 GHz 8 4 GBEros 1.6 GHz 8 4 GBPoseidon 2.3 GHz 8 8 GB
35/48
EXPERIMENTAL PARAMETERS
Parameter Value
Cost per KiloWatt hour $0.3
Time delay to power on a VM 1 min. 45 sec.
Time delay to power on a host 2 min. 55 sec.
Prediction horizon L1: 3 steps, L0: 1 step
Control sampling period L1: 150 sec, L0: 30 sec
Initial configuration for Gold service (application tier) 3 VMs
Initial configuration for Silver Service (application tier) 3 VMs
36/48
MAIN RESULTSA risk-neutral controller conserves, on average, 26% more
energy than a system without dynamic control with very few SLA violations
Workload Energy savings % of SLA violations (Silver)
% of SLA violations (Gold)
Workload 1 18% 3.2% 2.3%
Workload 2 17% 1.2% 0.5%
Workload 3 17% 1.4% 0.4%
Workload 4 45% 1.1% 0.2%
Workload 5 32% 3.5% 1.8%
More SLA violations for Silver requests than for Gold requests.
37/48
RESULTS (Contd.)
0 500 1000 1500 2000 2500
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
x 104
Tota
l CP
U c
ycle
s pe
r sec
ond
Time instance in 30 second increments
Total CPU Speed, Gold Application , Workload 1
0 500 1000 1500 2000 2500
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8x 10
4 Total CPU Speed, Silver Application, Workload 1
Time instance in 30 second increments
Tota
l CP
U c
ycle
s pe
r sec
ond
CPU shares assigned to the Gold and Silver applications over a 24-hour period – L0 layer
38/48
RESULTS (Contd.)Number of virtual machines assigned to the Gold and Silver
applications over a 24-hour period – L1 layer
100 200 300 400 5000
1
2
3
Time instance in 150 second increments
Num
ber o
f virt
ual m
achi
nes
on
Switching Activity, Gold Application, Workload 1
100 200 300 400 5000
1
2
3
Num
ber o
f virt
ual m
achi
nes
on
Time instance in 150 second increments
Switching Activity, Silver Application, Workload 1
39/48
EFFECT OF THE RISK PREFERENCE PARAMETER
A risk-averse ( = 2) controller conserves about the same amount of energy as a risk-neutral ( = 0) controller
WorkloadEnergy savings (risk-
neutral control) ( = 0)
Energy savings (risk-averse control)
( = 2)
Workload 6 20.8 % 20.9 %
Workload 7 25.3 % 25.2 %
40/48
EFFECT OF THE RISK PREFERENCE PARAMETER (Contd.)
A risk-averse controller ( = 2) maintains a higher QoS (Less violations) than a risk-neutral ( = 0) controller by reducing switching activity
WorkloadSLA violations (risk-
neutral control) ( = 0)
SLA violations (risk-averse control)
( = 2)
% reduction in SLA violations
Workload 6 28,635 (2.3%) 15,672 (1.7%) 45%
Workload 7 34,201 (2.7%) 25,606 (2.0%) 25%
WorkloadSwitching activity (risk-neutral control)
( = 0)
Switching activity (risk-averse control)
( = 2)
% reduction in switching activity
Workload 6 30 28 7%
Workload 7 40 30 25%
Best-case risk-averse controller:
41/48
OPTIMALITY CONSIDERATIONSThe controller cannot achieve optimal performance
– Limited by errors in workload predictions– Limited by constrained control inputs– Limited by a finite prediction horizon
Controller Total Energy Savings Total SLA violations Num. times hosts
switchedRisk neutral 25.3% 34,201 (2.7%) 40Risk averse 25.2% 25,606 (2.0%) 38
Oracle 16.3% 14,228 (1.1%) 32
To evaluate optimality, profit gains of a risk-neutral and best-case risk-averse controller were compared against an “oracle” controller with perfect knowledge of the future
42/48
CONCLUSIONSWe have addressed power and performance management in a
virtualized computing environment within a LLC frameworkThe cost of control and the notion of risk is encoded explicitly in
the problem formulation A server cluster managed using LLC saves 26% in power-
consumption costs over a 24 hour period when compared to an uncontrolled system
Power savings are achieved with very few SLA violations (1.6% of the total number of requests)
Our recommendation is a risk-averse controller since it reduces SLA violations and switching activity
43/48
Conclusion (1) – Why significant?
Why significant?– Using virtualization, implement a dynamic
resource provisioning model– Integrate power and performance
management, reduce energy cost (26%) while causing little SLA( service level agreement) violation (less than 3%)
44/48
Conclusion(2) - Alternate approach?
Alternate approach?Technique
EfficiencyImpact
• 3-5%
Deploy virtualization for existing and new demand
• 25-30%
Implement free cooling • 0-15%
Introduce greener and more power efficient servers
• 10-20%
Selectively turn off core components to increase remaining unit efficiency
45/48
Conclusion (3) – Improvement?
Simplify the control logic to reduce the exe. time
Integrate the memory usage when modify VM configuration
Provide a mechanism to decide the granularity to create VMs – One 6G VM can handle more requests than that of two 3G VMs.
46/48
SCALABILITYExecution time of the controller can be reduced
through various techniques– Approximating control– Implementing the controller in hardware– Increasing the number of tiers in the control hierarchy– Simplifying the iterative search process to “hold” a control
input constant over the prediction horizon
System
Pred ictivefilter
Systemm odel
System O ptim izer
A neural network or regression tree can be trained to learn the decision-making behavior of the optimizer
47/48
Scalability problem
Scalability is not good, current result is based on 5 hosts only. But there can be dozens or thousands of servers in actual data center.
» 5 hosts - < 10 sec»10 hosts - 2min. 30 sec»15 hosts – 30 min.
48/48
Questions?
Thank you!