Storage Management in Virtualized Cloud Environments

Storage Management in Virtualized Cloud

EnvironmentsSankaran Sivathanu, Ling Liu, Mei Yiduo and

Xing Pu

Student Workshop on Frontiers of Cloud Computing, IBM 2010

2

Talk Outline

• Introduction• Measurement results & Observations

– Data Placement & Provisioning– Workload Interference– Impacts of Virtualization

• Summary

3

Cloud & Virtualization

• Cloud Environment – Goals– Flexibility in resource configuration– Maximum resource utilization– Pay-per-use Model

• Virtualization – Benefits– Resource consolidation – Re-structuring flexibility– Separate protection domains

• Virtualization suits as one of the basic foundations of Cloud infrastructures

4

Fundamental Issues

• Could Service Providers (CSPs) vs. Customers– Customers purchase computing resources– CSPs provide virtual resources (VMs)– Customers perceive their resources as physical

machines!• Multiple VMs reside in single physical host

– Resource Interference – End-user performance depends on other users

• End-user unaware of where their data physically exists

5

Goals of our Measurement

• For cloud service providers– How to place data such that end-user performance is

maximized ?– How to co-locate workloads for least interference ?

• For End-Users– How to purchase resources in tune with requirement ?– How to tune applications for maximum performance ?

• General insights on storage I/O in virtualized environments

6

Benchmarks Used

• Postmark– Mail Server Workload– Create/Delete, Read/Append files– Parameters

• File Size• # of files• Read/Write ratio

• Synthetic Workload– Sequential vs. random accesses– Zipf Distribution

7

Data Provisioning & Placement

8

Workload

Data footprint ~150MB

4GB Partition

40GB Partition

Throughput : 2.1 MB/s Throughput : 1.4 MB/s

Performance Difference : 33%

Disk Provisioning

Consider a 100GB Disk

Case - I Case - II

9

Where to place VM disk ?

• Postmark benchmark– Read operation

• Cases :– Read from physical

partitions in different zones

• Based on LBNs• LBNs start from inner

zone and proceeds towards outer zones.

– Read from disk file (.vmdk)

10

Where to place multiple VM disks ?

• Postmark benchmark– 2 instances (1 for each VM)

• Random reads• Compare physical

partitions placed in different zones– O -> Outer– I -> Inner– M -> Mid

11

Observations

• Customers should purchase storage based on workload requirement, not price

• Thin provisioning may be practiced• Throughput intensive VMs can be placed in outer disk

zones• Multiple VMs that may be accessed simultaneously

should be co-located on disk– CSPs can monitor access patterns and move virtual disks

accordingly

12

Workload Interference

13

CPU-Disk Interference

VM - 1 VM - 2

CPU CPU

DISK DISKDISK

Throughput : 23.4 MB/s

CPU

Throughput : 27.6 MB/s

Performance Difference : 15.3%

Physical Host

14


CPU allocation ratios has no effect on disk throughput across VMs

Disk intensive job performs better along with a CPU intensive job

15

Reason ?

Dynamic Frequency Scaling


16


CPU DFS is enabled in Linux by defaultThree ‘governors’ to control the DFS policy

On-demand (default)Performance Power-save

When 1 core is idle, entire CPU is down-scaled because overall CPU utilization falls

17

Disk-Disk Interference

• 1 instance of Postmark in each VMs• 65.3% more time taken when compared to running

Postmark in a single VM• Overhead mainly attributed to disk seeks : No more

sequential accesses

CPU CPUV.Disk-1 V.Disk-2

Physical Disk

VM-1 VM-2

Physical Host

18

CPU CPUV.Disk-1 V.Disk-2

Disk - 1 Disk - 2

VM-1 VM-2


• VMs using separate physical disks• 17.52% more time taken when compared to running

Postmark in a single VM• Overhead attributed to contention in Dom-0’s queue

structures

Physical Host

19


• Postmark Benchmark (Reads)

• Cases :– Running in a single VM– 1 instance in each of two

VMs• 2 VMs reading from virtual

disks in same physical disk• 2 VMs reading from virtual

disks in different physical disks

20


• IO scheduling policy in Dom-0 has less effect

• ‘Ideal’ case is time taken when running Postmark in single VM

• Other cases are running 1 instance of Postmark in each of 2 VMs (separate physical disks)

21


• Interference with respect to workload type

• Synthetic read workload• VMs use separate

physical disks• Cases :

– Mix of sequential versus random reads

• Sequential requests from both VMs flood Dom-0 queue - contention

22

Observations

• CPU-intensive and disk-intensive workloads can be co-located for optimal performance and power

• Virtual disks that may be accessed simultaneously must be placed in separate physical disks

• I/O scheduling in Dom-0 has less effect on disk workload interference

• Two sequential workloads, when co-located suffer in performance due to queue contention

• With separate disks, workload contention is generally minimal, other than the case of two sequential workloads

23

Impacts of Virtualization

24

Sequentiality

• Postmark benchmark (reads)

• No much overhead seen for random disk accesses

• VM overhead is mitigated by larger disk overhead

• More felt for sequential disk accesses

25

Block Size

• Postmark sequential reads

• Fixed overhead with every requests

• As block sizes increase, # of requests are reduced, hence overhead is reduced

• Efficient to read in larger blocks

26

Block size wrt. Locality

27

Observations

• VM overhead is not felt in random workloads – amortized by disk seeks

• Extra layers of indirection is the reason for VM overhead – when block size is large, overhead is amortized

• Block size may be increased only if there is sufficient locality in access

28

Summary

• Storage purchased must depend on requirement, not price!• It is better to place sequentially accessed streams in outer

disk zone• Co-locate virtual disks that may be accessed simultaneously• Co-locate CPU intensive task with disk intensive task for

better power and performance• Avoid co-locating two sequential workloads on single

physical machine – even when it goes to separate physical disks!

• Read in large blocks only when there is locality in workload

29

Questions

Contact : [email protected]

Documents

Storage Management in Virtualized Cloud Environments