36
Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical and Computer Engineering, University of Florida 1

Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

Embed Size (px)

Citation preview

Page 1: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

1

Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University

AUTOMATED CONTROL FOR ELASTIC STORAGE

Presented by: Yonggang LiuDepartment of Electrical and Computer Engineering,

University of Florida

Page 2: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

2

OutlineIntroductionSystem overviewSystem architecture and modeling

methodologiesEvaluationContribution and related workDiscussions and future work

Page 3: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

3

OutlineIntroductionSystem overviewSystem architecture and modeling

methodologiesEvaluationContribution and related workDiscussions and future work

Page 4: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

4

Introduction -Popularity of highly dynamic workloads

Many web-based services (especially Web 2.0) often experience rapid load surges and drops.One Facebook application saw an increase

from 25,000 to 250,000 users in 3 days, with up to 20,000 new users signing up per hour during peak times.

Elastic services offered by cloud computing becomes one solutionGrow/shrink service capacity dynamically as

the load changes.

Page 5: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

5

Introduction - Elasticity in cloud computing

Elasticity is one of cloud computing’s greatest features – Systems acquire and release resources in response to users’ dynamic workloads; users only pay for what they need.

SLAsWeb Services

Virtualization

Picture provided by Dr. Andy Li from UF

Page 6: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

6

Introduction -Topic of this paper

This paper addresses the challenges associated with controlling the elastic storage in a data-intensive service, in cloud computing environment.

Intuitively, it does:If performance can not meet the Service

Level Objective (SLO) → grow storage capacity

If performance meets SLO, and system utilization is low → shrink storage capacity

Page 7: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

7

Introduction -Topic of this paperIn this paper, Hadoop Distributed File System

(HDFS) is employed as the storage system.When the controller increases the storage size:

Create new storage instancesMove storage data to the new instances (data

rebalancing)When the controller reduces the storage size:

Remove a certain number of storage instancesSome storage data on existing nodes get replicated

because the replica number is lower than the replica degree N. This is automatically done by DHFS.

Page 8: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

8

OutlineIntroductionSystem overviewSystem architecture and modeling

methodologiesEvaluationContribution and related workDiscussions and future work

Page 9: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

9

System overviewWhat is the big picture

Controller

Cloud Provider (Amazon EC2)

Web Tier (Apache server)Application Tier (Facebook

core)Storage Tier (Hadoop DFS)

Elastic Service

Clients

Sensor

Actuator

Gathermeasurements

Manage instances

Sensors highersystem load

Create more storageinstances, and rebalance data

Suppose we are hosting the Facebookserver on amazon EC2 instances, withthe proposed control techniques.

Sensors lowersystem load

Remove somestorage nodes

Page 10: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

10

System overviewChallenges in elastic storage control

Controlling elastic storage involves many challenges:Data Rebalancing. The newly added storage

nodes will not be effective until data rebalancing is done.

Interference to Guest Service. Data rebalancing also consumes the system resources.

Actuator Delay. The controller must consider the delay of the control operations, otherwise it may response too late or become unstable.

Page 11: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

11

OutlineIntroductionSystem overviewSystem architecture and modeling

methodologiesEvaluationContribution and related workDiscussions and future work

Page 12: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

12

System architectureThe controller is composed by:

Horizontal Scale Controller (HSC) - responsible for growing and shrinking the number of storage nodes.

Data Rebalance Controller (DRC) - controlling the data transfers to rebalance the storage tier after it grows or shrinks.

State machine - coordinating the actions of the HSC and the DRC.

Page 13: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

13

System architecture -Horizontal Scale Controller (HSC)

Actuator: The HSC uses cloud APIs to change the number of active server instances.

Sensor: The paper uses CPU utilization on the storage nodes as the sensor feedback metricIt is easy to measure, and strongly correlated

to overall response time of the Cloudstone benchmark when the bottleneck is on the storage tier.

Page 14: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

14

Modeling methodology -System model without controllerThe system without a controller can be described as this

graph:

U(z): Input to the system, the number of storage instances.

D(z): The effect of client workload variance on the value of storage instance number.

V(z): The effective number of storage instancesY(z): The Output of the system, the CPU utilization on

storage nodes.G(z): The transfer function of the storage system.

G(z)U(z) Y(z)

++V(z)

D(z)

Page 15: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

15

Modeling methodology -Controller - Integral control

Control Policy (K): Integral control

- the integral gain parameter. - the current sensor measurement. - the desired reference sensor

measurement, which is 20% CPU utilization for 3 second average response time.

G(z)R(z)

K(z)+-

E(z) U(z) Y(z)++

V(z)

D(z)

Page 16: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

16

Modeling methodology -Controller - discrete control functions

Because discrete actuators (instance number) are used in the system, the paper generates the following discrete control functions:

and are the higher and lower thresholds for CPU utilization .

Only when (under-provisioned) or (over-provisioned), , i.e., the controller adds/removes the storage instances.

Page 17: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

17

Modeling methodology -Proportional thresholding

How to set and ?They can’t be static, because for a cluster of

size N, adding/removing a node affects 1/N of the total capacity.

“Proportional thresholding” mechanism:Set , and vary to vary the range.Suppose “workload” is the per-node

workload and we have N instances. We get

Suppose , we get

Page 18: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

18

System architecture -Data Rebalance Controller (DRC)The DRC rebalances the layout of data in the system

after the number of storage nodes grows or shrinks.Rebalancing is a cause of actuator delay and

interference.Tuning knob of HDFS rebalancer:

Bandwidth b allocated to the rebalancer.Select b to control the tradeoff between lag and

interference.Big b - fast rebalance, serious impacts on normal

service.Small b - slow rebalance, not very disruptive to normal

service.

Page 19: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

19

Modeling Methodology -Modeling the impacts of b

The paper employed multi-variate regression to decide b:The time to completion of rebalancing (Time)

as a function of the bandwidth throttle (b) and size of data to be moved(s): .

The impact of rebalancing on service response time (Impact) as a function of the bandwidth throttle (b) and per-node workload (l): .

Values of s and l are measured by sensors in DRC.

Page 20: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

20

Modeling Methodology -Balancing between lag and interference

The Data Rebalance Controller poses the choice of b as a cost-based optimization problem:

The ratio of can be specified by the guest

based on the relative preference towards Time over Impact.

Page 21: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

21

System architecture -State machineRecall that:

Horizontal Scale Controller (HSC) is used to increase/shrink the number of storage nodes

Data Rebalance Controller (DRC) is used to rebalance the storage after the changes in storage node size

They have mutual dependencies:After HSC adds a new storage node, the system cannot

obtain full service until DRC completes rebalancing.When one component is taking actions, the noise will be

introduced to the sensor measurements of the other one.To preserve stability during adjustments, a state

machine is employed to coordinate HSC and DRC to manage their mutual dependencies.

Page 22: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

22

System architecture -State machine

The following diagram shows the internal state machine of the elasticity controller in the storage tier.

Horizontal Scale State

Rebalance state

Init

Storage tier configuration changed? No

Storage tier configuration

changed? Yes

Rebalancing done? Yes

Rebalancingdone? No

Elasticity Controller

Storage Tier

Page 23: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

23

OutlineIntroductionSystem overviewSystem architecture and modeling

methodologiesEvaluationContribution and related workDiscussions and future work

Page 24: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

24

Evaluation -Experimental TestbedThe paper employs CloudStone to run with

GlassFish as the front-end application server tier.CloudStone: a flexible Web 2.0 benchmark generatorGlassFish: an open source application server project

HDFS is used for the storageHDFS is modified to expose the rebalancer’s bandwidth

throttle b as an actuator to the external controller.The paper implements a local ORCA cluster as the

cloud infrastructure providerORCA: A resource control framework that provides a

resource leasing service; guests can lease resources from a substrate resource provider, such as a cloud provider

Page 25: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

25

Evaluation -Experimental TestbedThe experimental service cluster:

A group of servers running on a local network.To fully explore the effects of the storage tier:

Other tiers are statically over-provisioned.The storage tier nodes:

Dynamically allocated virtual machine instancesThey all have fixed resource configurations:

30 MB disk space; 512 MB RAM; single disk arm; 2.8 GHz CPU.

HDFS is preloaded with at least 36 GB data.

Page 26: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

26

Evaluation - Controller Effectiveness

Static and dynamic resource previsioning to load burst of 10 times at .

a1. CPU utilization - static

b1. Response time - static

a2. CPU utilization - dynamic

b2. Response time - dynamic

Target response time:3 seconds.Target CPU utilization:20%.

See from the figures:1. Dynamic provisioningis able to adapt to the load burst.2. Instance creation anddata rebalancing hascost and delay on effect.

Page 27: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

27

Evaluation - Controller Effectiveness

Static and dynamic resource previsioning to small load increase of 35% at .

a1. CPU utilization - static

b1. Response time - static

a2. CPU utilization - dynamic

b2. Response time - dynamic

Target response time:3 seconds.Target CPU utilization:20%.

See from the figures:1. Dynamic provisioningis alert enough to adapt tothe small load increase.2. The cost and delay ofnode creation/rebalancingare smaller than the prev.

Page 28: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

28

Evaluation - Resource Efficiency

Static and dynamic resource previsioning to load decrease of 30% at .

a1. CPU utilization - static

b1. Response time - static

a2. CPU utilization - dynamic

b2. Response time - dynamic

Target response time:3 seconds.Target CPU utilization:20%.

See from the figures:1. Shrinking the storage size has much lower cost/delay than increasing it.2. During resizing process,There are almost no SLOviolations.

Page 29: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

29

Evaluation - Comparison of Rebalance Policies

Recall that:, monotone decreasing function of b., monotone increasing function of b.And we want to optimize for the cost

function:

Page 30: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

30

OutlineIntroductionSystem overviewSystem architecture and modeling

methodologiesEvaluationContribution and related workDiscussions and future work

Page 31: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

31

Contribution and related workThis paper is the first to address the problem of

automated control for elastic storage in cloud computing.SCADS is a related work dealing with dynamically scaling

a storage system. It uses machine learning to predict resource requirements.

Padala et al. proposed a decoupled architecture (between guest and cloud provider) for cloud computing. They did not consider the actuator constraints.

Aqueduct uses a feedback controller to throttle the rebalancing bandwidth usage to ensure the SLOs will not be violated. The rebalancing may be able to use very little bandwidth.

Page 32: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

32

OutlineIntroductionSystem overviewSystem architecture and modeling

methodologiesEvaluationContribution and related workDiscussions and future work

Page 33: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

33

Discussions and future workThe proposed modeling method is not able

to correctly handle workloads with transient noise, which is common in reality.Adding a filter module solves the problem:

H(z)W(z)

G(z)R(z)

K(z)+-

E(z) U(z) Y(z)++

V(z)

D(z)

Page 34: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

34

Discussions and future workThe proposed model sets tight resource

allocation model. A small system load change often triggers adding/removing storage instances, which is very disruptive.Recall the proposed control function:

By setting lower or higher (not exceed ), we prevent the system from changing frequently.

The drawback of this approach: The system will be under-provisioned to some

extent.

Page 35: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

35

Discussions and future workMake the resource configuration of newly

created storage instances tunable.Resizing storage size by adding/removing

storage instances with flexible resource configuration.

Optimizing the system by exploring the capacity and efficiency of individual storage instances, rather than storage instance amount.

This requires investigating the performance of storage nodes under different setups: disk size, CPU frequency, RAM size, etc.

Page 36: Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical

36

THANK YOU!