74
Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

Embed Size (px)

Citation preview

Page 1: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

Conquest: Preparing forLife After Disks

An-I Andy Wang

Geoff Kuenning, Peter Reiher, Gerald Popek

Page 2: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

2

Conquest Overview File systems are optimized for disks

Performance problem Complexity

Now we have tons of inexpensive RAM What can we do with that RAM?

Page 3: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

3

Conquest Approach Combine disk and persistent RAM (e.g.,

battery-backed RAM) in a novel way Simplification

> 20% fewer semicolons than ext2, reiserfs, and SGI XFS

Performance (under popular benchmarks) 24% to 1900% faster than LRU disk caching

Page 4: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

4

Outline of the Talk Motivation Conquest design (high level) Conquest components Performance evaluation Conclusion

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 5: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

5

Motivation Most file systems are built for disks

Problems with the disk assumption: Performance Complexity

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 6: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

6

Hardware Evolution

1990 2000

1 KHz

1 MHz

1 GHzCPU (50% /yr)memory (50% /yr)

disk (15% /yr)

accessespersecond(log scale)

105106

1995(1 sec : 6 days) (1 sec : 3 months)

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 7: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

7

Inside Pandora’s Box

Disk arm Disk platters

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Access time = seek time (disk arm)

+ rotational delay (disk platter)

+ transfer time

Page 8: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

8

Disk Optimization Methods Disk arm scheduling Group information on

disk Disk readahead Buffered writes Disk caching

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Data mirroring Hardware parallelism

Page 9: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

9

Complexity Bytes

synchronization

predictive readahead

cache replacement

elevator algorithm

data clusteringdata consistencyasynchronous write

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 10: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

[Caceres et al., 1993; Hillyer et al., 1996; Qualstar 1998; Tanisys 1999; Micron Semiconductor Products 2000; Quantum 2000]

10

Storage Media Alternatives

accesses/sec (log scale)

$/MB (log scale)

100 103

persistent RAM

magnetic RAM?

(write once) flash memorydisktape

battery-backed DRAM10-3

10-3 106

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 11: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

[Grochowski 2000] 11

Price Trend of Persistent RAM

1995 2005

100

year

$/MB(log scale)

2000

10-2

10-1

101

102

paper/film

3.5" HDD2.5" HDD1" HDDpersistent RAM

booming of digitalphotography

4 to 10 GB of persistent RAM

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 12: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

12

Old Order; New World Disk will stay around

Cost, capacity, power, heat RAM as a viable storage alternative

PDAs, digital cameras, MP3 players More architectural changes due to RAM

A big assumption change from disk Rethink data structures, interfaces,

applications

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 13: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

13

What does it take to design and build a system that assumes ample persistent RAM as the primary storage medium?

Getting a Fresh Start

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 14: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

14

Conquest Design Design and build a disk/persistent-RAM

hybrid file system Deliver all file system services from memory,

with the exception of high-capacity storage Two separate data paths to memory and disk Benefits:

Simplicity Performance

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 15: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

15

Simplicity Remove disk-related complexities for most

files Make things simpler for disk as well Less complexity

Fewer bugs Easier maintenance Shorter data paths

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 16: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

16

Overall All management performed in memory

Memory data path No disk-related overhead

Disk data path Faster speed due to simpler access models

Performance

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 17: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

17

Conquest Components Media management Metadata representation Directory service Allocation service Persistence support Resiliency support

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 18: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

[Iram 1993; Douceur et al., 1999; Roselli et al., 2000] 18

User Access Patterns Small files

Take little space (10%) Represent most accesses (90%)

Large files Take most space Mostly sequential accesses

Not characteristic of database applications

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 19: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

19

Files Stored in Persistent RAM Small files (< 1MB)

No seek time or rotational delays Fast byte-level accesses Contiguous allocation

Metadata Fast synchronous update No dual representations

Executables and shared libraries In-place execution

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 20: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

20

Memory Data Path of Conquest

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Conventional File Systems

IO buffer

disk management

storage requests

IO buffermanagement

disk

persistencesupport

Conquest Memory Data Path

storage requests

persistencesupport

battery-backedRAM

small file and metadata storage

Page 21: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

[Devlinux.com 2000] 21

Large-File-Only Disk Storage Allocate in big chunks

Lower access overhead Reduced management overhead

No fragmentation management No tricks for small files

Storing data in metadata No elaborate data structures

Wrapping a balanced tree onto disk cylinders

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 22: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

22

Sequential-Access Large Files Sequential disk accesses

Near-raw bandwidth Well-defined readahead semantics Read-mostly

Little synchronization overhead (between memory and disk)

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 23: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

23

Disk Data Path of Conquest

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Conventional File Systems

IO buffer

disk management

storage requests

IO buffermanagement

disk

persistencesupport

Conquest Disk Data Path

IO buffermanagement

IO buffer

storage requests

disk management

disk

battery-backedRAM

small file and metadata storage

large-file-only file system

Page 24: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

24

Random-Access Large Files Random access?

Common definition: nonsequential access A typical movie has 150 scene changes MP3 stores the title at the end of the files

Near sequential access? Simplifies large-file metadata representation

significantly

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 25: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

25

Logical File Representation

File

Name(s) i-node File attributes

Data

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 26: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

26

Physical File Representation

File

Name(s) i-node File attributes Data locations

Data blocks

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 27: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

27

Ext2 Data Representation

data block location

index block location

index block location

index block location

data block location

index block location

index block location

data block location

data block location

i-node

12

data block location

data block locationdata block location

data block location

index block location

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 28: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

28

Disadvantages with Ext2 Design Designed for disk storage Optimization for small files makes things

complex Random-access data structure for large files

that are accessed mostly sequentially Data access time dependent on the byte

position in a file Maximum file size is limited

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 29: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

29

Conquest Representation Persistent RAM

Hash(file name) = location of data Offset(location of data)

Disk storage Per-file, doubly linked list of disk block

segments (stored in persistent RAM)

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 30: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

30

Advantages Conquest Design Direct data access for in-core files Worse case: sequential memory search for

random disk locations Maximum file size limited by physical storage

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 31: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

31

Directory Service Requirements

Fast sequential traversal (e.g., ls) Fast random lookup (e.g., locate file x) Hard links (apply multiple names to data)

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 32: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

32

First Design A doubly hashed table for each directory

Conserves space Problems:

Dynamic resizing of directories Need to handle the current file position Important for rm -fr

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 33: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

[Fagin et al., 1979] 33

Second Design A variant of extensible hash table for each

directory An old data structure fits nicely

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

empty

empty

0100 | file_1

1001 | file_2

empty

empty0100 | file1

1001 | file2

empty

0011 | dir1

1110 | file2_hardlink

Page 34: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

34

Additional Engineering Details Popular hash functions randomize lower bits Dynamic file positioning Need to handle collisions Memory overhead and complexity tradeoffs

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 35: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

35

Metadata Allocation Requirements

Keep track of usage status of metadata entries

Avoid duplicate allocation with unique IDs

Fast retrieval of metadata with a given ID

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

ID: 1| free

ID: 2| in use

ID: 3| free

ID: 4| free

ID: 5| in use

ID: 6| free

Page 36: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

36

Existing Memory Allocation Services

Keep track of unallocated memory

No duplicate allocation of physical addresses

Hmm…

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

ADDR 0xe000000| free

ADDR 0xe000038| in use

ADDR 0xe000070| free

ADDR 0xe0000A8| free

ADDR 0xe0000E0| free

ADDR 0xe000118| in use

Page 37: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

37

Conquest Metadata Management Metadata = memory allocated by memory

manager Metadata ID = physical address of metadata

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

ID: 1| free

ID: 2| in use

ID: 3| free

ID: 4| free

ID: 5| in use

ID: 6| free

ADDR 0xe000000| free

ADDR 0xe000038| in use

ADDR 0xe000070| free

ADDR 0xe0000A8| free

ADDR 0xe0000E0| free

ADDR 0xe000118| in use

Usage status

Unique IDs and fast retrieval

Page 38: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

38

Persistence Support Restore file system states after a reboot

Data Metadata Memory manager

Keep track of metadata allocation

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 39: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

39

Linux Memory Manager (1) Page allocator maintains individual pages

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page allocator

Page 40: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

40

Linux Memory Manager (2) Zone allocator allocates memory in power-of-

two sizes

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page allocator

Zone allocator

Page 41: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

41

Linux Memory Manager (3) Slab allocator groups allocations by sizes to

reduce internal memory fragmentation

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page allocator

Zone allocator

Slab allocator

Page 42: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

42

Linux Memory Manager (4) Difficult to restore the persistent states

Three layers of pointer-rich mappings Mixing of persistent and temporary allocations

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page allocator

Slab allocator

Zone allocator

Page 43: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

43

Conquest Persistence Create memory zones with own instantiations

of memory managers

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page allocator

Slab allocator

Zone allocator

Page 44: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

44

Conquest Persistence Encapsulate all pointers within each zone Pointers can survive reboots No serialization and deserialization Swapping and paging

Disabled for Conquest memory zones Enabled for non-Conquest zones

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 45: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

45

Resiliency Support Instantaneous metadata commit

No fsck (ad hoc metadata consistency check) Built-in checkpointing Pointer-switch commit semantics

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

pointerpointer

Page 46: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

46

Implementation Status Kernel module under Linux 2.4.2 Fully functional and POSIX compliant Modified memory manager to support

Conquest persistence Need to overcome BIOS limitations for

distribution Looking for licensing opportunities

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 47: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

47

Performance Evaluation Architectural simplification

Feature count Performance improvement

Memory-only workload Memory and disk workload

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 48: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

48

Conventional Data Path Buffer allocation management Buffer garbage collection Data caching Metadata caching Predictive readahead Write behind Cache replacement Metadata allocation Metadata placement Metadata translation Disk layout Fragmentation management

Conventional File Systems

IO buffer

disk management

storage requests

IO buffermanagement

disk

persistencesupport

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 49: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

49

Memory Path of Conquest Buffer allocation management Buffer garbage collection Data caching Metadata caching Predictive readahead Write behind Cache replacement Metadata allocation Metadata placement Metadata translation Disk layout Fragmentation management

Conquest Memory Data Path

storage requests

Persistencesupport

battery-backedRAM

small file and metadata storage

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Memory manager encapsulation

Page 50: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

50

Disk Path of Conquest Buffer allocation management Buffer garbage collection Data caching Metadata caching Predictive readahead Write behind Cache replacement Metadata allocation Metadata placement Metadata translation Disk layout Fragmentation management

Conquest Disk Data Path

IO buffermanagement

IO buffer

storage requests

disk management

disk

battery-backedRAM

small file and metadata storage

large-file-only file system

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 51: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

[Katcher 1997; Sweeney et al., 1996; Card et al., 1999; Namesys 2002] 51

Conquest is comparable to ramfs At least 24% faster than the LRU disk cache

ISP workload (emails, web-based transactions)

PostMark Benchmark (1)

0100020003000400050006000700080009000

5000 10000 15000 20000 25000 30000

files

trans / sec

SGI XFS reiserfs ext2fs ramfs Conquest

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

40 to 250 MB working set with 2 GB physical RAM

Page 52: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

52

0

1000

2000

3000

4000

5000

0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0

percentage of large files

trans / sec

SGI XFS reiserfs ext2fs Conquest

When both memory and disk components are exercised, Conquest can be several times faster than ext2fs, reiserfs, and SGI XFS

PostMark Benchmark (2)

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

10,000 files,80 MB to 3.5 GB working setwith 2 GB physical RAM

> RAM<= RAM

Page 53: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

53

When working set > RAM, Conquest is 1.4 to 2 times faster than ext2fs, reiserfs, and SGI XFS

PostMark Benchmark (3)

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

0

20

40

60

80

100

120

6.0 7.0 8.0 9.0 10.0

percentage of large files

trans / sec

SGI XFS reiserfs ext2fs Conquest

10,000 files,80 MB to 3.5 GB working setwith 2 GB physical RAM

Page 54: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

54

Sprite LFS Microbenchmarks (1) Small-file benchmark

Operates on 10,000 1-KB files in three phases

Motivation – Conquest Alternatives – Conquest Design – Performance Evaluation – Conclusion

020000400006000080000

100000120000140000160000180000

create read delete

op / sec

SGI XFS reiserfs ext2fs ramfs Conquest

Page 55: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

55

Sprite LFS Microbenchmarks (2) Modified large-file microbenchmark: 10 1-MB

files (Conquest in-core files)

Motivation – Conquest Alternatives – Conquest Design – Performance Evaluation – Conclusion

0

100

200

300

400

500

600

700

seq write seq read rand write rand read seq read

MB / sec

SGI XFS reiserfs ext2fs ramfs Conquest

Page 56: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

56

Sprite LFS Microbenchmarks (3) Modified large-file microbenchmark: 10 1.01-

MB files (Conquest on-disk files)

Motivation – Conquest Alternatives – Conquest Design – Performance Evaluation – Conclusion

0

100

200

300

400

500

600

700

seq write seq read rand write rand read seq read

MB / sec

SGI XFS reiserfs ext2fs ramfs Conquest

Page 57: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

57

Sprite LFS Microbenchmarks (4) Large-file microbenchmark: 40 100-MB files

(Conquest on-disk files)

Motivation – Conquest Alternatives – Conquest Design – Performance Evaluation – Conclusion

0

5

10

15

20

25

30

seq write seq read rand write rand read seq read

MB / sec

SGI XFS reiserfs ext2fs Conquest

Page 58: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

58

History’s Mystery

Puzzling Microbenchmark Numbers…

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Geoffrey Kuenning: “If Conquest is slower than ext2, I will toss you off of the balcony…”

Page 59: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

59

With me hanging off a balcony… Original large-file microbenchmark: 1-MB file

(Conquest in-core file)

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

0

100

200

300

400

500

600

700

seq write seq read rand write rand read seq read

MB / sec

SGI XFS reiserfs ext2fs ramfs Conquest

Page 60: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

60

0

100

200

300

400

500

600

700

seq write seq read rand write rand read seq read

MB / sec

SGI XFS reiserfs ext2fs ramfs Conquest

Odd Microbenchmark Numbers Why are random reads slower than sequential

reads?

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 61: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

61

0

100

200

300

400

500

600

700

seq write seq read rand write rand read seq read

MB / sec

SGI XFS reiserfs ext2fs ramfs Conquest

Odd Microbenchmark Numbers Why are RAM-based file systems slower than

disk-based file systems?

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 62: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

62

A Series of Hypotheses Warm-up effect?

Maybe Why do RAM-based systems warm up slower?

Bad initial states? No

Pentium III streaming IO option? No

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 63: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

63

Effects of Cache Footprint SizesLarge cache footprint Small cache footprint

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

write a file sequentially

footprint file end

footprint

read the same file sequentially

footprint

flush

file endfile

read

write a file sequentially

footprint file end

footprint

read the same file sequentially

footprint

flush

file end

read

file

Page 64: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

64

LFS Sprite Microbenchmarks Modified large-file microbenchmark: 10 1-MB

files (Conquest in-core files)

Motivation – Conquest Alternatives – Conquest Design – Performance Evaluation – Conclusion

0

100

200

300

400

500

600

700

seq write seq read rand write rand read seq read

MB / sec

SGI XFS reiserfs ext2fs ramfs Conquest

faster random over sequential accesses due to cache reuse

Page 65: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

66

Lessons Learned Faster than LRU caching, unexpected

Heavyweight disk handling Severe penalty for accessing memory content

Matching user access patterns to storage media offers considerable simplification and better performance Not an automatic result Need careful design

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 66: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

67

More Lessons Learned Effects of L2 caching become highly visible in

memory workloads (modern workloads) Cannot blindly apply existing disk-based

microbenchmarks to measure memory performance of file systems

Need to consider states of L2 cache and memory behaviors at each stage of microbenchmarking

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 67: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

68

Additional Lessons Learned Don’t discuss your performance numbers next

to a balcony…unless…

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 68: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

[McKusick et al., 1990; Ganger et al., 2000; Roselli et al., 2000; Seltzer et al., 2000]

69

Related Work (1) Disk caching

Assumption of scarce memory Complex mechanisms to maintain consistency

Especially with the presence of metadata

RAM drives and RAM file systems Not meant to be persistent Use disk-related mechanisms Limitations on storage capacity

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 69: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

[Riedel 1998; ZDNet 1999] 70

Related Work (2) Disk emulators

RAM storage accessed through SCSI interface Ad hoc approaches

Manual transferring of files to and from ramfs Capacity limitation

Background daemon to stage RAM files to a disk

Semantic and name space problems

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 70: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

71

Going Beyond Conquest (1) Matching usage patterns with heterogeneous

machines in the distributed domain Specialized tasks for machines within a cluster Preferably self-organizing and self-evolving

State-rich computing Caching of runtime data structures Similar to /tmp

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 71: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

72

Going Beyond Conquest (2) Separate storage of metadata from data

Association of metadata with data of different fidelity

Opportunity for hierarchical replication across devices with different calibers

Benchmarking memory performance of file systems Developing new memory benchmarks

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 72: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

73

Contributions Demonstrated the feasibility of disk-memory

hybrid file systems Showed performance does not preclude

simplicity Pinpointed cache-related problems with

modern benchmarks Opened doors to many exciting areas of

research

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 73: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

74

Conclusion Conquest demonstrates how rethinking

changes in underlying assumptions can lead to significant architectural and performance improvements

Radical changes in hardware, applications, and user expectations in the past decade should lead us to rethink other aspects of OS as well.

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 74: Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek

75

Questions . . .

Conquest: http://lasr.cs.ucla.edu/conquestAndy Wang: [email protected]