46
1 Thanh Do , Tyler Harter, Yingchao Liu, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau Haryadi S. Gunawi HARDFS: Hardening HDFS with Selective and Lightweight Versioning

Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

1

Thanh Do, Tyler Harter, Yingchao Liu, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau Haryadi S. Gunawi

HARDFS:���Hardening HDFS with���

Selective and Lightweight Versioning

Page 2: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

Cloud Reliability

2

q Cloud systems §  Complex software §  Thousands of commodity machines §  “Rare failures become frequent” [Hamilton]

q Failure detection and recovery §  “… has to come from the software” [Dean] §  “… must be a first-class operation” [Ramakrishnan et al.]

Page 3: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

3

Fail-stop failures q Machine crashes, disk failures

q Pretty much handled

q Current systems have sophisticated crash- recovery machineries §  Data replication §  Logging §  Fail-over

Page 4: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

Fail-silent failures

q Exhibits incorrect behaviors instead of crashing

q Caused by memory corruption or software bugs

q Crash recovery is useless if fault can spread

4

Master

Workers

Page 5: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

5

Fail-silent failure headlines

Page 6: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

Current approaches

6

Replicated state machine using BFT library

Ver. 1 Ver.2

Ver. 3 Agree?

N-Version programing

•  High resource consumption •  High engineering effort •  Rare deployment

Page 7: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

Selective and Lightweight Versioning (SLEEVE)

q  2nd version models basic protocols of the system

q  Detects and isolates fail-silent behaviors

q  Exploits crash recovery machinery for recovery 7

Master

Workers

Trusted sources

Reloading state during reboot

Page 8: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

8

Selective and lightweight versioning (SLEEVE)

q Selective §  Goal: small engineering effort §  Protects important parts

-  Bug sensitive -  Frequently changed -  Currently unprotected

q Lightweight §  Avoids replicating full state §  Encodes states to reduce space

A B C D

A D

0 1 0 01 0 1 00 1 0 1

Page 9: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

9

HARDFS q HARDFS - hardened version HDFS: §  Namespace management §  Replica management §  Read/write protocol

q HARDFS detects and recovers from: §  90% of the faults caused by random memory corruption §  100% of the faults caused by targeted memory corruption §  5 injected software bugs

q Fast recovery using micro-recovery §  3 orders of magnitude faster than full reboot

q Little space and performance overhead

Page 10: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

10

Outline ü  Introduction

q HARDFS Design

q HARDFS Implementation

q Evaluation

q Conclusion

Page 11: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

Case study: ���namespace integrity

11

NameNode

Client

Create(F)

F

F

Normal Operation

txCreat(F)

NameNode

Client

exists(F)

F

No

G

Corrupted HDFS

Client

exists(F)

F

Yes

G

F

F

HARDFS

NameNode

Incorrect behavior

Trusted source

Page 12: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

SLEEVE layer components

12

•  Interposition module •  State manager •  Action verifier •  Recovery module

Page 13: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

State manager q Replicates subset of state of the main version

§  Directory entries without modification time

q Adds new state incrementally §  Adds permissions for security checks

q Understands semantics of various protocol messages and thread events to update state correctly

q Compresses state using compact encoding

13

Page 14: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

Naïve: Full replication

q HDFS master manages millions of files

q 100% memory overhead reduces HDFS master scalability [;login; ‘11]

14

FF100% memory overhead

Page 15: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

Lightweight: ���Counting Bloom Filters

q Space-efficient data structure

q Supports 3 APIs §  insert(“A fact”) §  delete(“A fact”) §  exists(“A fact”)

15

Page 16: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

Lightweight: ���Counting Bloom Filters

q  Suitable for boolean checking § Does F exist? § Does F has length X? § Has block B been allocated? 16

“F is 10 bytes”

Disagreement detected!

F:10

insert(“F is 10 bytes”)

F:10 F:5 F:10

exists(“F is 5 bytes”) à NO

“Give me length of F”

Page 17: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

Challenges of using ���Counting Bloom Filters

q Hard to check stateful system

q False positives

17

Page 18: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

Non-boolean verification

18

“F is 20 bytes”

F:10 F:10 F:10 F:20

X = returnSize(F) delete(F:X) insert(F:20)

Bloom filter does not support this API

Before After

Page 19: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

Non-boolean verification

19

“F is 20 bytes”

F:10 F:10

X ç MainVersion.returnSize(F); IF exists(F:X) delete(F:X); insert(F:20); ELSE initiate recovery;

Ask-Then-Check

F:10 F:20

Before After

Page 20: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

Stateful verification

20

Bloom Filter (boolean verification)

Checking stateful systems

Ask Then Check

Page 21: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

Dealing with False positive q  Bloom filters can give false positive

§  4 per billion §  1 false positive per month (given 100 op/s)

q  Only leads to unnecessary recovery

21

F G

Trusted source

F

F

Reloading state

Page 22: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

22

Outline ü  Introduction

q HARDFS Design ü  Lightweight §  Selective §  Recovery

q HARDFS Implementation

q Evaluation

q Conclusion

Page 23: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

Selective Checks

q  Goals: small engineering effort

q  Selectively chooses namespace protection

q  Excludes security checks 23

Client create(F)

G F F

HDFS Master

F

txCreate(F)

Client

Operation log

exists(F)

Disagreement detected! No Yes

X ß mainVersion.exists(F); Y ß bloomFilter.exists(F); If X != Y then handleDisagreement();

Page 24: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

Incorrect action examples

24

Create(F)

txCreate(F)

Create(F)

reject

Create(F)

txCreate(D/F) txMkdir(D)

txCreate(F)

Create(D/F) Mkdir(D)

Normal correct action Corrupt action Missing action

Orphan action Out-of-order action

All of these happen in practice

Page 25: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

Action verifier q Set of micro-checks to detect incorrect

actions of the main version

q Mechanisms: §  Expected-action list §  Actions dependency checking §  Timeout §  Domain knowledge to handle disagreement

25

Page 26: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

26

Outline ü  Introduction

q HARDFS Design ü  Lightweight ü  Selective q  Recovery

q HARDFS Implementation

q Evaluation

q Conclusion

Page 27: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

Recovery

q Crash is good provided no fault propagation

q Detects and turns bad behaviors into crashes

q Exploits HDFS crash recovery machineries 27

Master

Workers

Trusted sources

Reloading state during reboot

Page 28: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

HARDFS Recovery

q Full recovery (crash and reboot)

q Micro-recovery §  Repairing the main version §  Repairing the 2nd version

28

Page 29: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

Crash and Reboot

q  Full state is reconstructed from trusted sources

q  Full recovery may be expensive §  Restarting an HDFS master could take hours

29

Reloading Full state

Page 30: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

Micro-recovery

q  Repairs only corrupted state from trusted sources

q  Falls back to full reboot when micro-recovery fails

30

Page 31: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

Repairing main version

31

Main Version

2nd Version

F:100

Trusted source: checkpoint file

F:200 F:100

Direct update F:200 ç F:100 F:100

Page 32: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

Repairing 2nd version

32

Main Version

2nd Version

F:100

Trusted source: checkpoint file

F:200

Must: 1. Delete(“F is 200 bytes”) 2. Insert(“F is 100 bytes”) F:100

Solution: 1. Start with an empty BF 2. Add facts as they are verified

F:100

Page 33: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

33

Outline ü  Introduction

ü HARDFS Design

q HARDFS Implementation

q Evaluation

q Conclusion

Page 34: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

Implementation q Hardens three functionalities of HDFS

§  Namespace management (HARDFS-N) §  Replica management (HARDFS-R) §  Read/write protocol of datanodes (HARDFS-D)

q Uses 3 Bloom filters API §  insert(“a fact”), delete(“a fact”), exists(“a fact”)

q Uses ask-then-check for non-boolean verification

34

Page 35: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

Protecting ���namespace integrity q Guards namespace structures necessary for

reaching data: §  File hierarchy §  File-to-block mapping §  File length information

q Detects and recovers from namespace-related problems: §  Corrupt file-to-block mapping §  Unreachable files

35

Page 36: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

Namespace management Message Logic of the secondary version

Create(F): Client request NN to create F

Entry: ��� If exists(F) Then reject; Else insert(F); generateAction(txCreate[F]); Return: check response;

AddBlock(F): client requests NN to allocate a block to file F

Entry: F:X = ask-then-check(F); Return: B = addBlk(F);��� If exists(F) & !exists(B) Then X′ = X ∪ {B}; delete(F:X); insert(F:X′) insert(B@0); Else declare error;

36

Page 37: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

37

Outline ü  Introduction

ü HARDFS Design

ü HARDFS Implementation

q Evaluation and Conclusion

Page 38: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

Evaluation

q Is HARDFS robust against fail-silent faults?

q How much time and space overhead incurred?

q Is micro-recovery efficient?

q How much engineering effort required?

38

Page 39: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

Random memory corruption results Outcome HDFS HARDFS

Silent failure 117 9

Detect and reboot - 140

Detect and micro-recover - 107

Crash 133 268

Hang 22 16

No problem observed 728 460

39

q # fail-silent failures reduced by factor of 10

q Crash happens twice as often

Page 40: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

Silent failures FIELD HDFS HARDFS

pathname 95 0

replication 1 0

modification time 6 8

permission 3 0

block size 12 1

40

Page 41: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

0  

100  

200  

300  

400  

500  

600  

700  

800  

200K   400K   600K   800K   1000K  

Mem

ory  allocated  (M

B)  

File  system  size  (number  of  files)  HDFS   HARDFS  +  Concrete  State   HARDFS  +  Bloom  Filters  

Namepsace management Space Overhead

41

HARDFS with Bloom filter incurs little space overhead (2.6%)

Page 42: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

Recovery Time

42

1  

10  

100  

1000  

10000  

200K   400K   600K   800K   1000K  Recovery  Tim

e  (secon

ds)  

File  system  size  (number  of  files)  

Reboot   Micro-­‐recovery   OpGmized  Micro-­‐recovery  

•  Rebooting NameNode is expensive •  Micro-recovery is 3 order of magnitude faster

Page 43: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

Complexity (LOC)

Functionality HDFS HARDFS

Namespace management 10114 1751 17%

Replica management 2342 934 40%

Read/write protocol 5050 944 19%

Others 13339 - -

43

•  Lightweight versions are smaller

Page 44: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

Injecting software bugs Bug Year Priority Description HARDFS

HADOOP-1135 2007 Major Blocks in block report wrongly marked for deletion ✔

HADOOP-3002 2008 Blocker Blocks removed during safemode ✔ HDFS-900 2010 Blocker Valid replica deleted rather than

corrupt replica ✔ HDFS-1250 2010 Major Namenode processes block

report from dead datanode ✔ HDFS-3087 2012 Critical Decommission before replication

during namenode restart ✔

44

Page 45: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

Conclusion

q Crashing is good

q To die (and be reborn) is better than to lie

q But lies do happen in reality

q HARDFS turns lies into crashes

q Leverages existing crash recovery techniques to resurrect

45

Page 46: Hardening HDFS with Selective and Lightweight Versioning · Hardening HDFS with" Selective and Lightweight Versioning! Cloud Reliability" 2 ! Cloud systems! ... Namenode processes

Thank you!���Questions?

46

http://research.cs.wisc.edu/adsl/

http://ucare.cs.uchicago.edu/

http://wisdom.cs.wisc.edu/