Gecko: Contention-Oblivious Disk Arrays for Cloud Storage · Gecko showed less read-write contention and higher cache hit rate Gecko’s throughput is 2X-3X higher Gecko Log + RAID10

Ji-Yong Shin

Cornell University

In collaboration with Mahesh Balakrishnan (MSR SVC),

Tudor Marian (Google), and Hakim Weatherspoon (Cornell)

Gecko: Contention-Oblivious Disk Arrays for Cloud Storage

FAST 2013

• What happens to storage?

Cloud and Virtualization

VMM

Guest VM

Guest VM

Guest VM

Guest VM

…

Shared Disk

Shared Disk

Shared Disk

Shared Disk

SEQUENTIAL

RANDOM

2

0

50

100

150

200

250

300

350

1 2 3 4 5 6 7

Thro

ugh

pu

t (M

B/s

)

# of Seq Writers

Seq - VM+EXT4

Rand - VM+EXT4

0

50

100

150

200

250

300

350

1 2 3 4 5 6 7

Thro

ugh

pu

t (M

B/s

)

# of Seq Writers

Seq - VM+EXT4

Rand - VM+EXT4

I/O CONTENTION causes throughput collapse

4-Disk RAID0

Existing Solutions for I/O Contention?

• I/O scheduling: reordering I/Os

– Entails increased latency for certain workloads

– May still require seeking

• Workload placement: positioning workloads to minimize contention

– Requires prior knowledge or dynamic prediction

– Predictions may be inaccurate

– Limits freedom of placing VMs in the cloud

3

– Log all writes to tail

Log-structured File System to the Rescue? [Rosenblum et al. 91]

Addr 0 1 2 … … N

…

Wri

te

Wri

te

Wri

te

Log

Tail

Shared Disk

4

• Garbage collection is the Achilles’ Heel of LFS [Seltzer et al. 93, 95; Matthews et al. 97]

…

Shared Disk

Challenges of Log-Structured File System

… N …

Log

Tail

Shared Disk Shared

Disk

…

GC

Garbage Collection (GC) from Log Head

5

Rea

d

GC and read still cause disk seek

Log

He

ad

Challenges of Log-Structured File System

• Garbage collection is the Achilles’ Heel of LFS

– 2-disk RAID-0 setting of LFS

– GC under write-only synthetic workload RAID 0 + LFS

6

0

50

100

150

200

250

1

9

17

25

33

41

49

57

65

73

81

89

97

10

5

11

3

12

1

Thro

ugh

pu

t (M

B/s

)

Time (s)

GC

Aggregate

App

Max Sequential Throughput

Throughput falls by 10X during GC

Problem:

Increased virtualization leads to

increased disk seeks and kills performance

RAID and LFS do not solve the problem

7

Rest of the Talk

• Motivation

• Gecko: contention-oblivious disk storage

– Sources of I/O contention

– New technique: Chained logging

– Implementation

• Evaluation

• Conclusion

8

What Causes Disk Seeks?

• Write-write

• Read-read

• Write-read

9

VM 1 VM2

Wri

te

Wri

te

Rea

d

Rea

d

Wri

te

Rea

d

What Causes Disk Seeks?

• Write-write

• Read-read

• Write-read

• Logging

– Write-GC read

– Read-GC read

10

VM 1 VM2

Wri

te

Rea

d

GC

Principle: A single sequentially accessed disk is better

than multiple randomly seeking disks

11

Gecko’s Chained Logging Design

• Separating the log tail from the body

– GC reads do not interrupt the sequential write

– 1 uncontended drive >>faster>> N contended drives

Disk 2 Disk 1 Disk 0

Physical Addr Space

Disk 2

12

Eliminates write-write and reduces write-read contention

Log Tail


• Separating the log tail from the body

– GC reads do not interrupt the sequential write

– 1 uncontended drive >>faster>> N contended drives


Log Tail

Physical Addr Space

GC

Garbage collection

from Log Head to Tail

Disk 2

13

Eliminates write-write and reduces write-read contention

Eliminates write-GC read

contention


• Smarter “Compact-In-Body” Garbage Collection

Disk 1 Disk 0

Physical Addr Space

GC

14

Garbage collection from Head to Body

Log Tail

Disk 2

Garbage collection

from Log Head to Tail

Gecko Caching

• What happens to reads going to tail drives?

15

LRU Cache

Reduces write-read contention


Wri

te

Tail Cache

Rea

d

Rea

d

Rea

d

Body Cache (Flash )

Reduces read-read contention

RAM Hot data

MLC SSD Warm data

Gecko Properties Summary No write-write contention,

No GC-write contention, and Reduced read-write contention

16


Wri

te

Tail Cache

Rea

d

Rea

d

Body Cache (Flash )

Disk 1’ Disk 0’ Disk 2’

Fault tolerance + Read performance

Mirroring/ Striping

Disk 1’ Disk 0’

Power saving w/o consistency

concerns

Reduced read-write and read-read

contention

Gecko Implementation Primary map: less than 8 GB RAM

for a 8 TB storage

Inverse map: 8 GB flash for a 8 TB storage (written every 1024 writes)

4KB pages

empty filled

Data (in disk)

Physical-to-logical map (in flash) h

ead

hea

d

4-byte entries Logical-to-physical map (in memory)

17

R

R

W

W

W

W

tail

tail


W W

Evaluation

1. How well does Gecko handle GC?

2. Performance of Gecko under real workloads?

3. Effect of varying Gecko chain length?

4. Effectiveness of the tail cache?

5. Durability of the flash based tail cache?

18

Evaluation Setup

• In-kernel version

– Implemented as block device for portability

– Similar to software RAID

• Hardware

– WD 600GB HDD

• Used 512GB of 600GB

• 2.5” 10K RPM SATA-600

– Intel MLC (multi level cell) SSD

• 240GB SATA-600

• User-level emulator

– For fast prototyping

– Runs block traces

– Tail cache support

19

How Well Does Gecko Handle GC?

Gecko

20

Log + RAID0

Gecko’s aggregate throughput always remains high 3X higher aggregate & 4X higher application throughput

0

50

100

150

200

250

1

12

23

34

45

56

67

78

89

10

0

11

1

12

2

Thro

ugh

pu

t (M

B/s

)

Time (s)

GC

Aggregate

App

0

50

100

150

200

250

1

11

2

1

31

4

1

51

6

1

71

8

1

91

1

01

1

11

Thro

ugh

pu

t (M

B/s

)

Time (s)

GC

Aggregate

App

2-disk setting; write-only synthetic workload

Max Sequential Throughput of 1 disk Max Sequential

Throughput of 2 disks


Gecko

21

Log + RAID0

Gecko’s aggregate throughput always remains high 3X higher aggregate & 4X higher application throughput

0

50

100

150

200

250

1

12

23

34

45

56

67

78

89

10

0

11

1

12

2

Thro

ugh

pu

t (M

B/s

)

Time (s)

GC

Aggregate

App

0

50

100

150

200

250

1

11

2

1

31

4

1

51

6

1

71

8

1

91

1

01

1

11

Thro

ugh

pu

t (M

B/s

)

Time (s)

GC

Aggregate

App

2-disk setting; write-only synthetic workload

Aggregate throughput = Max throughput

Throughput collapses


22

App throughput can be preserved using smarter GC

Gecko

0

50

100

150

200

250

1

11

2

1

31

4

1

51

6

1

71

8

1

91

1

01

1

11

Thro

ugh

pu

t (M

B/s

)

Time (s)

GC

Aggregate

App

0

50

100

150

200

250

No GC Move- to-Tail

GC

Compact- in-Body

GC

Thro

ugh

pu

t (M

B/s

)

MS Enterprise and MSR Cambridge Traces

Trace Name Estimated Addr Space

Total Data Accessed (GB)

Total Data Read (GB)

Total Data Written (GB) TotalIOReq NumReadReq NumWriteReq

prxy 136 2,076 1,297 779 181,157,932 110,797,984 70,359,948

src1 820 3,107 2,224 884 85,069,608 65,172,645 19,896,963

proj 4,102 2,279 1,937 342 65,841,031 55,809,869 10,031,162

Exchange 4,822 760 300 460 61,179,165 26,324,163 34,855,002

usr 2,461 2,625 2,530 96 58,091,915 50,906,183 7,185,732

LiveMapsBE 6,737 2,344 1,786 558 44,766,484 35,310,420 9,456,064

MSNFS 1,424 303 201 102 29,345,085 19,729,611 9,615,474

DevDivRelease 4,620 428 252 176 18,195,701 12,326,432 5,869,269

prn 770 271 194 77 16,819,297 9,066,281 7,753,016

23

[Narayanan et al. 08, 09]

What is the Performance of Gecko under Real Workloads?

• Gecko

– Mirrored chain of length 3

– Tail cache (2GB RAM + 32GB SSD)

– Body Cache (32GB SSD)

• Log + RAID10

– Mirrored, Log + 3 disk RAID-0

– LRU cache (2GB RAM + 64GB SSD)

24

Gecko showed less read-write contention and higher cache hit rate Gecko’s throughput is 2X-3X higher

Gecko Log + RAID10

Mix of 8 workloads: prn, MSNFS, DevDivRelease, proj, Exchange, LiveMapsBE, prxy, and src1

6 Disk configuration with 200GB of data prefilled

0

50

100

150

200

1

9

17

25

33

41

49

57

65

73

81

89

97

10

5

11

3

Thro

ugh

pu

t (M

B/s

)

Time (s)

Read Write

0

50

100

150

200

1

9

17

25

33

41

49

57

65

73

81

89

97

10

5

11

3

Thro

ugh

pu

t (M

B/s

)

Time (s)

Read Write

What is the Effect of Varying Gecko Chain Length?

• Same 8 workloads with 200GB data prefilled

• Single uncontended disk

• Separating reads and writes

25

better performance

0

20

40

60

80

100

120

140

160

180

Thro

ugh

pu

t (M

B/s

) Read Write

RAID 0

errorbar = stdev

How Effective Is the Tail Cache? • Read hit rate of tail cache (2GB RAM+32GB SSD) on 512GB disk

• 21 combinations of 4 to 8 MSR Cambridge and MS Enterprise traces

26

Tail cache can effectively resolve read-write contention

• At least 86% of read hit rate • RAM handles most of hot data

• Amount of data changes hit rate • Still average 80+ % hit rate

0 10 20 30 40 50 60 70 80 90

100

1 3 5 7 9 11 13 15 17 19 21

Hit

Rat

e (

%)

Application Mix SSD

RAM

50

60

70

80

90

100

100 200 300 400 500 H

it R

ate

(%

) Amount of Data in Disk (GB)

How Durable is Flash Based Tail Cache?

• Static analysis of lifetime based on cache hit rate

• Use of 2GB RAM extends SSD lifetime

27

2X-8X Lifetime Extension

0

100

200

300

400

500

600

700

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

SSD

Lif

etim

e (

Day

s)

Application Mix

Lifetime at 40MB/s

Conclusion • Gecko enables fast storage in the cloud

– Scales with increasing virtualization and number of cores

– Oblivious to I/O contention

• Gecko’s technical contribution

– Separates log tail from its body

– Separates reads and writes

– Tail cache absorbs reads going to tail

• A single sequentially accessed disk is better than multiple randomly seeking disks 28

Question?

29

Documents

Gecko: Contention-Oblivious Disk Arrays for Cloud Storage · Gecko showed less read-write contention and higher cache hit rate Gecko’s throughput is 2X-3X higher Gecko Log + RAID10