The Only Constant is Change: Incorporating Time-Varying Bandwidth Reservations in Data Centers Di Xie, Ning Ding, Y. Charlie Hu, Ramana Kompella 1

1

The Only Constant is Change: Incorporating Time-Varying Bandwidth

Reservations in Data Centers

Di Xie, Ning Ding, Y. Charlie Hu, Ramana Kompella

2

Cloud Computing is Hot

Private Cluster

3

Key Factors for Cloud Viability

• Cost

• Performance

4

Performance Variability in Cloud

• BW variation in cloud due to contention [Schad’10 VLDB]

• Causing unpredictable performance

Local Cluster Amazon EC20

100

200

300

400

500

600

700

800

900

1000

Bandwidth (Mbps)

5

Reserving BW in Data Centers

• SecondNet [Guo’10]– Per VM-pair, per VM access bandwidth reservation

• Oktopus [Ballani’11]– Virtual Cluster (VC)– Virtual Oversubscribed Cluster (VOC)

6

How BW Reservation Works

. . .

Virtual Cluster Model

Time

Bandwidth

N VMs

VirtualSwitch

1. Determine the model 2. Allocate and enforce the model

0 T

B

Only fixed-BW reservationRequest <N, B>

7

Network Usage for MapReduce Jobs

Hadoop Sort, 4GB per VM

Hadoop Word Count, 2GB per VM

Hive Join, 6GB per VM

Hive Aggregation, 2GB per VM

Time-varying network usage

8

Motivating Example

• 4 machines, 2 VMs/machine, non-oversubscribednetwork

• Hadoop Sort– N: 4 VMs– B: 500Mbps/VM

1Gbps

500Mbps500Mbps

500Mbps

Not enough BW

9

Motivating Example

• 4 machines, 2 VMs/machine, non-oversubscribednetwork

• Hadoop Sort– N: 4 VMs– B: 500Mbps/VM

1Gbps

500Mbps

10

Under Fixed-BW Reservation Model

1Gbps

500MbpsJob3Job2


Job1 Time

0 5 10 15 20 25 30

500

Bandwidth

11

Under Time-Varying Reservation Model

1Gbps

500Mbps

TIVC Model

Job1 Time

0 5 10 15 20 25 30

500Job2Job3Job4Job5

J1 J2J3 J4J5

Bandwidth

Doubling VM, network utilization and the job

throughput

HadoopSort

12

Temporally-Interleaved Virtual Cluster (TIVC)

• Key idea: Time-Varying BW Reservations

• Compared to fixed-BW reservation– Improves utilization of data center

• Better network utilization• Better VM utilization

– Increases cloud provider’s revenue– Reduces cloud user’s cost– Without sacrificing job performance

13

Challenges in Realizing TIVC

. . .


Time

Bandwidth

N VMs

VirtualSwitch 0 T

B

Request <N, B>

Time

Bandwidth

0 T

B

Request <N, B(t)>

Q1: What are right model functions?

Q2: How to automatically derive the models?

14


Q3: How to efficiently allocate TIVC?

Q4: How to enforce TIVC?

15


• What are the right model functions?

• How to automatically derive the models?

• How to efficiently allocate TIVC?

• How to enforce TIVC?

16


• What are the right model functions?




17

How to Model Time-Varying BW?

Hadoop Hive Join

18

TIVC Models

Virtual Cluster

T11 T32

19

Hadoop Sort

20

Hadoop Word Count

v

21

Hadoop Hive Join

22

Hadoop Hive Aggregation

23


What are the right model functions?




24

Possible Approach

• “White-box” approach– Given source code and data of cloud application,

analyze quantitative networking requirement– Very difficult in practice

• Observation: Many jobs are repeated many times– E.g., 40% jobs are recurring in Bing’s production data

center [Agarwal’12]– Of course, data itself may change across runs, but size

remains about the same

25

Our Approach

• Solution: “Black-box” profiling based approach1. Collect traffic trace from profiling run2. Derive TIVC model from traffic trace

• Profiling: Same configuration as production runs– Same number of VMs– Same input data size per VM– Same job/VM configuration

How much BW should we give to the application?

26

Impact of BW Capping

No-elongation BW threshold

27

Choosing BW Cap

• Tradeoff between performance and cost– Cap > threshold: same performance, costs more– Cap < threshold: lower performance, may cost less

• Our Approach: Expose tradeoff to user1. Profile under different BW caps2. Expose run times and cost to user3. User picks the appropriate BW cap

Only below threshold ones

28

From Profiling to Model Generation

• Collect traffic trace from each VM– Instantaneous throughput of 10ms bin

• Generate models for individual VMs

• Combine to obtain overall job’s TIVC model– Simplify allocation by working with one model– Does not lose efficiency since per-VM models are

roughly similar for MapReduce-like applications

29

Generate Model for Individual VM

1. Choose Bb

2. Periods where B > Bb, set to BcapBW

Time

Bcap

Bb

30

Maximal Efficiency Model

•

• Enumerate Bb to find the maximal efficiency model

Volume Bandwdith ReservedVolume Traffic nApplicatio

Efficiency BW

Time

Bcap

Bb

31



How to automatically derive the models?



32

TIVC Allocation Algorithm

• Spatio-temporal allocation algorithm– Extends VC allocation algorithm to time dimension– Employs dynamic programming

• Properties– Locality aware– Efficient and scalable

• 99th percentile 28ms on a 64,000-VM data center in scheduling 5,000 jobs

33




How to efficiently allocate TIVC?


34

Enforcing TIVC Reservation

• Possible to enforce completely in hypervisor– Does not have control over upper level links– Requires online rate monitoring and feedback– Increases hypervisor overhead and complexity

• Observation: Few jobs share a link simultaneously– Most small jobs will fit into a rack– Only a few large jobs cross the core– In our simulations, < 26 jobs share a link in 64,000-VM

data center

35

Enforcing TIVC Reservation

• Enforcing BW reservation in switches– Avoid complexity in hypervisors– Can be implemented on commodity switches

• Cisco Nexus 7000 supports 16k policers

36




How to efficiently allocate TIVC?

How to enforce TIVC?

37

Proteus: Implementing TIVC Models

1. Determine the model

2. Allocate and enforce the model

38

Evaluation

• Large-scale simulation– Performance– Cost– Allocation algorithm

• Prototype implementation– Small-scale testbed

39

Simulation Setup

• 3-level tree topology– 16,000 Hosts x 4 VMs– 4:1 oversubscription

• Workload– N: exponential distribution around mean 49 – B(t): derive from real Hadoop apps

50Gbps

10Gbps

…

… …1Gbps

…

20 Aggr Switch

20 ToR Switch

40 Hosts

… … …

40

Batched Jobs

• Scenario: 5,000 time-insensitive jobs

42% 21% 23% 35%

1/3 of each type

Completion time reduction

All rest results are for mixed

41

Varying Oversubscription and Job Size

25.8% reduction for non-oversubscribed

network

42

Dynamically Arriving Jobs

• Scenario: Accommodate users’ requests in shared data center– 5,000 jobs, Poisson arrival, varying load

Rejected: VC: 9.5%

TIVC: 3.4%

43

Analysis: Higher Concurrency

• Under 80% load

7% higher job concurrency

28% higher VM utilization

Rejected jobs are large

28% higher revenue

Charge VMs

V M

44

Tenant Cost and Provider Revenue

• Charging model– VM time T and reserved BW volume B– Cost = N (kv T + kb B)

– kv = 0.004$/hr, kb = 0.00016$/GB

12% less cost for tenants Providers make

more money

Amazon target utilization

45

Testbed Experiment

• Setup– 18 machines– Tc and NetFPGA rate

limiter

• Real MapReduce jobs

• Procedure– Offline profiling– Online reservation

46

Testbed ResultTIVC finishes job faster than VC,

Baseline finishes the fastest

Baseline suffers elongation, TIVC achieves similar performance as VC

47

Conclusion• Network reservations in cloud are important

– Previous work proposed fixed-BW reservations– However, cloud apps exhibit time-varying BW usage

• We propose TIVC abstraction – Provides time-varying network reservations– Uses simple pulse functions– Automatically generates model– Efficiently allocates and enforces reservations

• Proteus shows TIVC benefits both cloud provider and users significantly

48

Backup slides

49

Adding Cushions to Model

Without cushion With 60s cushion

50

Network UtilizationVC reserves 26.4% abs.

more bandwidth

But less actual utilization (8.9% vs. 20.1%)

51

BW Variability on Cloud

[Ballani’11]

52

Model Refinement

• Can we further reduced BW for low efficiency pulses without elongation? – This allows us potentially fit more jobs

Hadoop Hive Join

53

Model Refinement (cont.)

• If efficiency of a pulse < γ lower the cap so that efficiency = α• γ = 8%, α = 20%

Documents

The Only Constant is Change: Incorporating Time-Varying Bandwidth Reservations in Data Centers Di Xie, Ning Ding, Y. Charlie Hu, Ramana Kompella 1