16
Using Hints to Improve Inline Block-Layer Deduplication Sonam Mandal, 1 Geoff Kuenning, 3 Dongju Ok, 1 Varun Shastry, 1 Philip Shilane, 4 Sun Zhen, 1,5 Vasily Tarasov, 2 Erez Zadok 1 1 Stony Brook University; 2 IBM Research – Almaden; 3 Harvey Mudd College; 4 EMC Corporation; 5 HPCL, NUDT, China 14 th USENIX Conference on File and Storage Technologies 2016

Using Hints to Improve Inline Block-Layer Deduplication...FAST 2016 - Using Hints to Improve Inline Block-Layer Deduplication Introduction: Efficient Resource Utilization 02/25/2016

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Using Hints to Improve Inline Block-Layer Deduplication...FAST 2016 - Using Hints to Improve Inline Block-Layer Deduplication Introduction: Efficient Resource Utilization 02/25/2016

Using Hints to Improve Inline Block-Layer Deduplication

Sonam Mandal,1 Geoff Kuenning,3 Dongju Ok,1 Varun Shastry,1 Philip Shilane,4 Sun Zhen,1,5 Vasily Tarasov,2 Erez Zadok1

1Stony Brook University; 2IBM Research – Almaden; 3Harvey Mudd College; 4EMC Corporation; 5HPCL, NUDT, China

14th USENIX Conference on File and Storage Technologies 2016

Page 2: Using Hints to Improve Inline Block-Layer Deduplication...FAST 2016 - Using Hints to Improve Inline Block-Layer Deduplication Introduction: Efficient Resource Utilization 02/25/2016

2 FAST 2016 - Using Hints to Improve Inline Block-Layer Deduplication

Outline l  Introduction l Dmdedup Overview l Hints l Evaluation l Conclusion and Future Work

02/25/2016

Page 3: Using Hints to Improve Inline Block-Layer Deduplication...FAST 2016 - Using Hints to Improve Inline Block-Layer Deduplication Introduction: Efficient Resource Utilization 02/25/2016

3 FAST 2016 - Using Hints to Improve Inline Block-Layer Deduplication

Introduction: Storage Stack

02/25/2016

File System

Block Layer (simple read/write interface)

Deduplication

Application

RAID Block Device

SSD HDD

Universal

Semantic context

lost

Page 4: Using Hints to Improve Inline Block-Layer Deduplication...FAST 2016 - Using Hints to Improve Inline Block-Layer Deduplication Introduction: Efficient Resource Utilization 02/25/2016

4 FAST 2016 - Using Hints to Improve Inline Block-Layer Deduplication

Introduction: Reliability

02/25/2016

File System

Block Layer (simple read/write interface)

Application

RAID Block Device

SSD HDD

File system superblock

gets deduplicated

File system writes multiple copies of the superblock for reliability

Only one copy of the

superblock is written to disk

Page 5: Using Hints to Improve Inline Block-Layer Deduplication...FAST 2016 - Using Hints to Improve Inline Block-Layer Deduplication Introduction: Efficient Resource Utilization 02/25/2016

5 FAST 2016 - Using Hints to Improve Inline Block-Layer Deduplication

Introduction: Efficient Resource Utilization

02/25/2016

File System

Block Layer (simple read/write interface)

Application

RAID Block Device

SSD HDD

Application writes out

unique data blocks (does

not deduplicate)

CPU

Memory

I/O to disk

Deduplication Overhead

Page 6: Using Hints to Improve Inline Block-Layer Deduplication...FAST 2016 - Using Hints to Improve Inline Block-Layer Deduplication Introduction: Efficient Resource Utilization 02/25/2016

6 FAST 2016 - Using Hints to Improve Inline Block-Layer Deduplication

Introduction: Performance

02/25/2016

File System

Block Layer (simple read/write interface)

Application

RAID Block Device

SSD HDD

Application issues a file copy within same block

device

Same data read is written out:

predictable hashes will be accessed soon

Prefetch hashes to avoid extra I/O

Page 7: Using Hints to Improve Inline Block-Layer Deduplication...FAST 2016 - Using Hints to Improve Inline Block-Layer Deduplication Introduction: Efficient Resource Utilization 02/25/2016

7 FAST 2016 - Using Hints to Improve Inline Block-Layer Deduplication

Dmdedup Components

File System Application

HDD

Data Device Metadata Device

SSD

Dmdedup Block Device Deduplication Logic

Hash Index

Other Deduplication

Metadata

Stackable block device

Multiple metadata backends supported: inram and cowbtree

02/25/2016

Page 8: Using Hints to Improve Inline Block-Layer Deduplication...FAST 2016 - Using Hints to Improve Inline Block-Layer Deduplication Introduction: Efficient Resource Utilization 02/25/2016

8 FAST 2016 - Using Hints to Improve Inline Block-Layer Deduplication

NODEDUP Hint

File System Application

HDD

Data Device Metadata Device

SSD

Dmdedup Block Device Deduplication Logic

1) open() flag: O_NODEDUP

2) REQ_META flag, jbd2 process

Useful when: •  Writing unique

data: to avoid wastage of resources

•  Need to store duplicate chunks: for reliability

3) Checked on write path

Hash Index

Other Deduplication

Metadata

02/25/2016

4) Hashing and hash index update are skipped. Other data structures still need update

Page 9: Using Hints to Improve Inline Block-Layer Deduplication...FAST 2016 - Using Hints to Improve Inline Block-Layer Deduplication Introduction: Efficient Resource Utilization 02/25/2016

9 FAST 2016 - Using Hints to Improve Inline Block-Layer Deduplication

PREFETCH Hint

File System Application

HDD

Data Device Metadata Device

SSD

Dmdedup Block Device Deduplication Logic

1) open() flag: O_PREFETCH

Useful when: Soon to be used hashes are already known: hash index lookup is an expensive operation.

Hash Index

2) We don’t usually add the hash on read path. On hint, we add hash to prefetch cache on read path.

Other Deduplication

Metadata

3) Hash index lookup is skipped if the hash is found in prefetch cache on write. Other data structures still need lookup

02/25/2016

Page 10: Using Hints to Improve Inline Block-Layer Deduplication...FAST 2016 - Using Hints to Improve Inline Block-Layer Deduplication Introduction: Efficient Resource Utilization 02/25/2016

10 FAST 2016 - Using Hints to Improve Inline Block-Layer Deduplication

Evaluation: Workloads l  NODEDUP hint:

u Filebench’s Fileserver workload § Data and metadata ops emulating a file server

u Configuration § No NODEDUP hint § Metadata only marked with NODEDUP hint § Metadata and data marked with NODEDUP hint

l  PREFETCH hint: u File copy workload modified to pass PREFETCH

hint l Dmdedup’s Cowbtree backend used for all

experiments 02/25/2016

Page 11: Using Hints to Improve Inline Block-Layer Deduplication...FAST 2016 - Using Hints to Improve Inline Block-Layer Deduplication Introduction: Efficient Resource Utilization 02/25/2016

11 FAST 2016 - Using Hints to Improve Inline Block-Layer Deduplication

NODEDUP Hint: Fileserver Workload

l Modified Filebench u Write data based on a given duplicate

distribution – unique data for results shown l Modified Fileserver workload definition

u 4KB blocks instead of 1KB blocks. l Experimented with different file systems

02/25/2016

Page 12: Using Hints to Improve Inline Block-Layer Deduplication...FAST 2016 - Using Hints to Improve Inline Block-Layer Deduplication Introduction: Efficient Resource Utilization 02/25/2016

12 FAST 2016 - Using Hints to Improve Inline Block-Layer Deduplication

NODEDUP Hint: Fileserver Workload on Nilfs2

0

1

2

3

4

5

640k (1%) 1280k (2%) 3205k (5%) 6410k (10%)

Thro

ughp

ut (K

iops

)

Dmdedup Cache Size (KB/% of total)

no-hint md-hint-on data+md-hint-on

3.5 – 4.5x increase in throughput

4.5x Increase

02/25/2016

% of the total cache space required to store dedup metadata for given dataset.

Page 13: Using Hints to Improve Inline Block-Layer Deduplication...FAST 2016 - Using Hints to Improve Inline Block-Layer Deduplication Introduction: Efficient Resource Utilization 02/25/2016

13 FAST 2016 - Using Hints to Improve Inline Block-Layer Deduplication

PREFETCH Hint: File Copy Workload

l Copy 1GB file with unique 4KB blocks u Modified dd – O_PREFETCH open() flag

on read path u Unmodified dd

l Called sync and umount to make sure data reached block layer.

l Experimented with different file systems

02/25/2016

Page 14: Using Hints to Improve Inline Block-Layer Deduplication...FAST 2016 - Using Hints to Improve Inline Block-Layer Deduplication Introduction: Efficient Resource Utilization 02/25/2016

14 FAST 2016 - Using Hints to Improve Inline Block-Layer Deduplication

PREFETCH Hint: File Copy Workload on Nilfs2

0

10

20

30

40

50

60

70

80

Raw Filesystem

102k (1%) 205k (2%) 512k (5%) 1024k (10%)

Elap

sed

Tim

e (s

ec)

Dmdedup Cache Size (KB/% of total)

no-hint hint-on

1.5 – 1.8x improvement in copy time

1.8x improvement

02/25/2016

% of the total cache space required to store dedup metadata for given dataset.

Page 15: Using Hints to Improve Inline Block-Layer Deduplication...FAST 2016 - Using Hints to Improve Inline Block-Layer Deduplication Introduction: Efficient Resource Utilization 02/25/2016

15 FAST 2016 - Using Hints to Improve Inline Block-Layer Deduplication

Conclusion and Future Work l Conclusion – Hints help

u Adds reliability § Duplicate file system metadata blocks

u Improves performance § Prefetches hashes

u Uses resources efficiently § Avoids deduplication overhead for unique data

l  Future Work u Hint support for other file systems u PREFETCH Hint for Nilfs2 segment cleaning u Cross-block device support for hash

prefetching

02/25/2016

Page 16: Using Hints to Improve Inline Block-Layer Deduplication...FAST 2016 - Using Hints to Improve Inline Block-Layer Deduplication Introduction: Efficient Resource Utilization 02/25/2016

Using Hints to Improve Inline Block-Layer Deduplication

Sonam Mandal,1 Geoff Kuenning,3 Dongju Ok,1 Varun Shastry,1 Philip Shilane,4 Sun Zhen,1,5 Vasily Tarasov,2 Erez Zadok1

1Stony Brook University; 2IBM Research – Almaden; 3Harvey Mudd College; 4EMC Corporation; 5HPCL, NUDT, China

Q&A More results in paper

Git Repository for Dmdedup: git://git.fsl.cs.sunysb.edu/linux-dmdedup.git