35
Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

Storage NetworksHow to Handle Heterogeneity

Bálint MiklósJanuary 24th, 2005ETH Zürich

External Memory Algorithms and Data Structures

Page 2: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

What Storage Networks are?

• Persistent Storage – Hard Disks• Device capacity is doubled every 14-18

months – data grows faster• Use many disks• Need to protect, access, and manage

the ever-growing volume of storage assets

Storage Networks – Motivation

2

Page 3: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

Hardware FailuresStorage Networks – Motivation

power supply

6%

FS error6%

disk subsystem

10%

disk error10%

disk failure42%

others26%

Trace collected from the Internet Archive (March 2003)courtesy of David Pease (UCSC) & Kelly Gottlib

3

Page 4: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

Heterogen Storage Networks

• Increasing system speed, capacity: add new disks

• New disks usually have different characteristics than the older disks in the system.

• Many modern storage systems are distributed: Ethernet, FibreChannel.

• How to exploit this heterogeneity?

Storage Networks – Motivation

4

Page 5: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

Goal

• Storage system requirements: – space and access balance– availability– resource efficiency– access efficiency– heterogeneity– adaptivity– locality

• Very difficult to meet ALL requirements.

Storage Networks – Motivation

5

Page 6: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

Outline

• Model

• AdaptRaid• HERA• RIO

• Conclusions

Storage Networks

6

Page 7: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

What Model to Use?

• Why not to use the layout of external memory algorithms?– We need solution for all the (sub)problems– One has to bypass operating system:

complex task

• Therefore different abstraction level:– Set of disks characterized by capacity and

bandwidth– Connection network is unrestricted: e.g.

SCSI, P2P

Storage Networks – Model

7

Page 8: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

Model assumptions

• Disk access patterns generated by file system (OS)

• Difficult to predict these and can change

• Assume uniform pattern, our goal is to distribute data evenly

Storage Networks – Model

8

Page 9: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

Outline

• Model

• AdaptRaid• HERA• RIO

• Conclusions

Storage Networks

9

Page 10: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

Heterogeneous Storage Networks

• Straightforward solution:– Clustering disks according their characteristics– We can have many clusters– Easy to extend– New, faster do not improve overall response time

• Randomized batched solution [Sanders]:– Map randomly data to disks– Schedule a batch of accesses by solving a network

flow problem– Unfeasible for large systems: many flow problems to

be solved– Batch like behavior is a disadvantage.

10

Storage Networks – Heterogeneity

Page 11: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

RAID

• Redundant Array of Inexpensive Disks• RAID level 0:

– Striping data across a set of disks

• RAID level 5:– Add a redundancy block per

stripe– Distribute redundancy

information evenly on every disk

11

Storage Networks – AdaptRaid

www.raidrecoveryguide.com

Page 12: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

AdaptRaid 0

12

Storage Networks – AdaptRaid

• Basic idea:– Load each disk depending on its

characteristics• First solution:

– Use all disks like in RAID0 until smallest disk is full

– Then, discard full disks, and continue the same way

– Distribution continues until all disks are full

• Lower portion of address space has better access times

• Extend RAID layout for heterogeneity [Cortes, Labarta]

Page 13: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

AdaptRaid 0 – Reducing Variance

13

Storage Networks – AdaptRaid

• Reduce variance: – Algorithm temporarly assumes that

disks are smaller.– Repeat pattern more times

• Stripes in a Pattern (SIP) defines the size of the pattern and the degree of variance

• Each disk has the same number of blocks like before

Page 14: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

AdaptRaid 5

14

Storage Networks – AdaptRaid

• Similar idea, but one block is used for parity information

• Difference: A write implies updating of the parity.

• If not all the blocks in the stripe are written, a write needs additional read:

small-write problem

Page 15: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

AdaptRaid 5 – Small-write Solution

15

Storage Networks – AdaptRaid

• Reference stripe: OS assumes to be a full stripe

• Size of every stripe is a divisor of the reference stripe

• Logically three steps:– Decrease strip size– Distribute evenly empty space

on all disks– Apply Tetris like method to fill

empty blocks

Page 16: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

AdaptRaid 5 – variance reduction

Storage Networks – AdaptRaid

• We can use similar variance reduction like in AdaptRaid 0:

– Repeat more times a smaller pattern

16

Page 17: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

AdaptRaid – generalization

Storage Networks – AdaptRaid

• What if bigger disks are not the faster ones?

• Until now we tried to use all blocks in a disk, now we want to use less blocks on slow disks

• Utilization Factor (UF): – 0..1 value per disk

• UF can be set based: – disk size (until now)– performance

17

Page 18: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

AdaptRaid – summary

Storage Networks – AdaptRaid

• Decide UF for every disk:– How much we want to load a disk

• Decide SIP for the system:– How big the pattern is

• Performance:Adaptivity Speedup

AdaptRaid 0: RAID 0 8%-35%AdaptRaid 5: ? < 30%

Performance measured by simulators.

18

Page 19: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

Outline

• Model

• AdaptRaid• HERA• RIO

• Conclusions

Storage Networks

19

Page 20: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

Heterogeneous Extension of RAID

• Disk merging tehnique• Disks are partitioned into logical disks• Logical disks have the same bandwidth

and capacity

• We group logical disks in G parity groups

• We have G homogeneous systems.

Storage Networks – HERA

20

Page 21: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

Heterogeneous Extension of RAID

• Constraint:

• Each logical disk in a parity group should map to different physical disk

Storage Networks – HERA

i

l

p

DG

21

Page 22: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

Heterogeneous Extension of RAID

• Read: online load balancing algorihtm directs request for a block to the disk with the least loaded disk.

• Every disk has a queue with all reads and deadlines.

• Deliver requested blocks based on deadline, and location on disk (to minimize seek-time overhead)

Storage Networks – HERA

22

Page 23: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

Heterogeneous Extension of RAID

• The availability is almost as good as the homogeneous case (RAID 5).

• But much more flexible than RAID 5.

• Performance relies on logical disk distribution, which is the task of administrator

• The authors recently proposed a configuration planning algorithm which optimizes for bandwidth and storage:[Zimmermann, Ghandeharizadeh: Highly Available and Heterogeneous Continuous Media Storage Systems] December 2004

Storage Networks – HERA

23

Page 24: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

Outline

• Model

• AdaptRaid• HERA• RIO

• Conclusions

Storage Networks

24

Page 25: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

Random I/O Mediaserver

• Randomized distribution strategy• Concentrates on delivering multimedia objects.

Optimized for real-time reading:– Video on demand– 3D interactive virtual world navigation– Interactive scientific visualization

• Idea: place data unit on a random disk at a random position. This will insure a long term load balance.

Storage Networks – RIO

25

Page 26: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

Homogeneous RIO – Data Placement

• A multimedia object is composed of a sequence of constant size data block.

• Data block is placed on random disk on random location -> long term load balancing

• By replicating a fraction of the data blocks, we allow short term balancing

Storage Networks – RIO

26

Page 27: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

Homogeneous RIO – Read Scheduler

• All reads have a deadline. Non real-time request have infinite deadline.

• Request for a block is routed to the disk with the least load

• A disk serves more blocks request in a cycle:– A number of blocks are selected from the disk request

queue– The selected requests are reordered according to their

location on disk to minimize the seek-time overhead and serviced.

Storage Networks – RIO

27

Page 28: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

Heterogeneous RIO – Data Placement

• Place data to a disk with probability proportional to its size:

• Probability to place data on disk:• Note that:

• Disk capacity increasing faster than disk bandwidth -> faster, bigger disks are going to be bottleneck

Storage Networks – RIO

S

Cd

jj

nj

jd1

1

28

Page 29: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

Heterogeneous RIO – BSR

• n disks (Di):– Capacity: Ci– Bandwidth: Bi

• Total capacity:

• Total bandwidth:

• Bandwidth space ratio (BSR):

• BSR is a hint how much load disk can take

Storage Networks – RIO

n

i

iCC1

n

i

iBB1

CCB

Bbs

i

i

i

29

Page 30: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

Heterogeneous RIO – Clusters

• Goal: redirect load from small BSR disks to higher BSR disks.

• Group disks in clusters based on their BSRs.

• Low BSP clusters would have high load.• How much replication do we need to sustain

a certain load?

Storage Networks – RIO

30

Page 31: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

Heterogeneous RIO – Replication Factor

• We want to sustain a maximum load of

• Data without replicas:

• Maximum load on a cluster is:

• To use all bandwidth we need :

->

Storage Networks – RIO

31

C

CirB

D

Cimaximax )1(

BBmax

r

CD

1

Bmax ii Bmax

ii

i

bs

CCB

Br 1 1)max( ibsr

Page 32: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

Heterogeneous RIO – Summary

• Randomized data placement

• Read scheduler to optimized read bandwidth

• Based on disk characteristics we need different replication factor to sustain certain bandwidth

• Authors claim that in a few years 10% to 40% replication is sufficient to allow to use the full aggregate bandwidth of the network

Storage Networks – RIO

32

Page 33: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

Outline

• Model

• AdaptRaid• HERA• RIO

• Conclusions

Storage Networks

33

Page 34: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

Conclusions

• All three methods concentrate on optimizing bandwidth and space utilization. Adaptivity is hard to achieve

• AdaptRaid and HERA– Deterministic– Extend homogeneous RAID – AdaptRaid 5 wastes space?

• RIO– Randomized– How fast is read scheduler?– The only one where the autors showed a real-life

implementation (Virtual World Data Center)

Storage Networks – Conclusions

34

Page 35: Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures

Storage Networks

Thank You!

Questions?

Bálint Miklós

35