44
Coerced Cache Evic-on and DiscreetMode Journaling: Dealing with Misbehaving Disks Abhishek Rajimwale * , Vijay Chidambaram , Deepak Ramamurthi Andrea ArpaciDusseau, Remzi ArpaciDusseau * Data Domain Inc University of Wisconsin Madison

Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Embed Size (px)

Citation preview

Page 1: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Coerced  Cache  Evic-on  and  Discreet-­‐Mode  Journaling:  Dealing  with  Misbehaving  Disks  

Abhishek  Rajimwale*,  Vijay  Chidambaram,  Deepak  Ramamurthi        Andrea  Arpaci-­‐Dusseau,  Remzi  Arpaci-­‐Dusseau  

*Data  Domain  Inc  University  of  Wisconsin  Madison  

Page 2: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Disks  are  not  perfect  

DSN 11 2 3/13/12

•  Expanding  disk  fault  model  

•  Latent  Sector  Errors  [Bairavasundaram  SIGMETRICS  07]  –  RAID-­‐6  

•  Block  Corrup-on  [Bairavasundaram FAST 08]  –  Checksums  

•  The  disk  cache  – Always  trusted  so  far  

 Disk  Surface  

Disk  Cache  

Page 3: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Disk  Caches  •  Disk  cache  improves  performance  –  But  at  the  risk  of  data  loss  

•  Order  of  writes  issued  by  file  system:  – A,  B  ,C  

•  Disks  reorder  writes  during  destaging:  –  B,  A,  C  

•  File  systems  flush  the  disk  cache  to  ensure  correct  ordering  of  writes  – A,  flush,  B,  flush,  C  

DSN 11 3 3/13/12

 Disk  Surface  

Disk  Cache  

Write  to  disk  

Page 4: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Problem:  Flushing  doesn’t  work  

•  Disks  can  fail  to  flush  data  upon  request  

•  One  reason:  Bugs  –  Errors  in  the  storage  stack  [Bairavasundaram  FAST  08]  

–  Improper  propaga-on  of  error  codes  [Bairavasundaram  FAST  08]  –  Inadequate  failure  policies  [Prabhakaran  SOSP  05]  –  Bugs  in  the  firmware  [Ghemawat  SOSP  03]  

DSN 11 4 3/13/12

Page 5: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Disks  can  lie!  

DSN 11 5

0  5  10  15  20  25  30  35  40  45  50  

4k   16k   64k   128k   512k   1m  

Avg  6m

e  (m

sec)  

Write  size  

Sequen6al  writes  

w/  cache  

w/o  cache  

•  Misbehaving  disks  ignore  or  delay  flush  requests  •  Increases  risk  for  data  loss  -  File  systems  usually  blamed  for  such  loss  

   

3/13/12

Page 6: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Disks  can  lie!  

DSN 11 6

F_FULLFSYNC  

•  From  the  fcntl  man  page  in  Mac  OSX:  Does the same thing as fsync(2) then asks the drive to flush all buffered data to the permanent storage device (arg is ignored). This is currently implemented on HFS, MS-DOS (FAT), and Universal Disk Format (UDF) file systems. The operation may take quite a while to complete. Certain FireWire drives have also been known to ignore the request to flush their buffered data.

3/13/12

•  Evidence  from  industry  experts  – Microsoc  –  Seagate  

Page 7: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Ordering  points  are  essen-al  •  All  modern  file  systems  depend  on  ordering  points  –  Journaling  file  systems  (ext3,  ext4)  •  Data  before  the  commit  block  

–  Copy  on  write  file  systems  (ZFS)  •  Data  before  the  uber-­‐block  

•  If  ordering  points  are  not  enforced:  – Data  corrup-on  –  Inconsistent  file  system  

DSN 11 7 3/13/12

Page 8: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Summary  •  We  present  Coerced  Cache  Evic-on  (CCE)  – Write  extra  data  into  the  cache  to  evict  target  blocks  

•  We  show  how  to  characterize  9  SATA  disk  drive  cache  –  Examine  the  wide  range  of  caching  policies    

•  We  implement  CCE  in  ext3  – Well  known  journaling  file  system  

•  CCE  provides  stronger  enforcement  for  ordering  points  – At  acceptable  overheads  

DSN 11 8 3/13/12

Page 9: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Outline  

•  Mo-va-on  •  Background  •  Coerced  Cache  Evic-on  •  Cache  Fingerprin-ng  •  Discreet  Mode  Journaling  •  Evalua-on  •  Conclusion  

DSN 11 9 3/13/12

Page 10: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

File  System  Background  •  Consider  dele-ng  a  file  –  Removing  its  directory  entry  –  Freeing  the  space  occupied  by  the  file  and  its  metadata  

•  Journaling  file  system  – Makes  sure  all  changes  get  to  disk  or  none  do  – Groups  writes  into  transac-ons  – Writes  everything  to  a  log  first  –  Checkpoints  to  disk  later  

DSN 11 10 3/13/12

Page 11: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

File  System  Background  •  Ext3  file  system  –  Semi-­‐modern  journaling  file  system  – Well  known,  well  understood  

•  Variants  of  journaling  – Data  journaling  mode  •  Everything  (data,  metadata)  goes  to  the  log  first  

– Ordered  journaling  mode  •  Only  metadata  is  logged  

DSN 11 11 3/13/12

Page 12: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Disk Surface Journal

Fixed locations

Data  Journaling  

DSN 11 12

D D D C M M Memory

3/13/12

B

Page 13: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Disk Surface Journal

Fixed locations

Disk Cache

Data  Journaling  

DSN 11 13

D D D C M M Memory

3/13/12

B

Page 14: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Outline  

•  Mo-va-on  •  Background  •  Coerced  Cache  Evic-on  •  Cache  Fingerprin-ng  •  Discreet  Mode  Journaling  •  Evalua-on  •  Conclusion  

DSN 11 14 3/13/12

Page 15: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Coerced  Cache  Evic-on  •  Ensures  that  cache  has  been  truly  flushed    •  Key  idea:  –  Extra  writes  to  flush  the  disk  cache  – Desired  Order  of  writes:  A,  B,  C  – With  CCE:  • Write  A  • Write  to  flush  zone  • Write  B  • Write  to  flush  zone  • Write  C  

DSN 11 15 3/13/12

Page 16: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Disk Surface

Flush Zone

Disk Cache

Journal

Fixed locations

Coerced  Cache  Evic-on  

DSN 11 16

D D D C M M Memory F F F F F F F F

3/13/12

B F

Page 17: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Coerced  Cache  Evic-on  •  Desired  proper-es:  – High  probability  of  flushing  target  blocks  –  Low  performance  overhead  

•  Need  to  understand  the  disk  cache  to  design            the  flush  workload  

DSN 11 17 3/13/12

Page 18: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Outline  

•  Mo-va-on  •  Background  •  Coerced  Cache  Evic-on  •  Cache  Fingerprin-ng  •  Discreet  Mode  Journaling  •  Evalua-on  •  Conclusion  

DSN 11 18 3/13/12

Page 19: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Cache  Fingerprin-ng  

•  Manufacturers  don’t  expose  details  about  disk  caches  

•  Disk  caches  can  vary  in:  –  Read/Write  par--on  size  – Number  of  segments  –  Replacement  policy  

•  Poorly  characterized  in  literature  

DSN 11 19 3/13/12

Disk  Cache  

Page 20: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Cache  Fingerprin-ng  •  Flush  micro-­‐benchmark:  – Write  target  block  – Write  varied  flush  workload  –  measure  cost  –  fsync()  –  Read  target  –  infer  evic>on  

•  Micro-­‐benchmark  is  repeated    –  Probability  of  evic-on  is  calculated  

•  Vary  in  each  workload:  – Number  of  writes  – Amount  of  data  in  each  write  –  Sequen-al/Random  writes  

DSN 11 20 3/13/12

Page 21: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Cache  Fingerprin-ng  

DSN 11 21 3/13/12

•  Evic-on  fingerprint  –  Probability  of  evic-on  is  visually  shown  – Darker  region  indicates  higher  probability  

 90  –  100%    70  -­‐  90      50  –  70    30  –  50    10  –  30    0  –  10  

Eviction Probability

Page 22: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Cache  Fingerprin-ng  

DSN 11 22 3/13/12

•  Performance  fingerprint  –  Time  taken  to  write  flush  workload  – Darker  region  indicates  more  -me  

 500+  ms    100  –  500    50  –  100    10  -­‐  50    0  -­‐  10  

Flush Latency

Page 23: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Cache  Fingerprin-ng  •  Selec-ng  a  flush  workload:  –  Combine  informa-on  from  both  fingerprints  – High  probability  of  evic-on    •  Dark  region  in  evic-on  fingerprint  

–  Low  performance  cost  •  Light  region  in  performance  fingerprint  

DSN 11 23 3/13/12

Page 24: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Cache  Fingerprin-ng  Manufacturer   Cache  (MB)   Capacity  

(GB)  

Hitachi   8   80  

Hitachi   32   1024  

Samsung   8   250  

Samsung   16   250  

Western  Digital   16   320  

Western  Digital   64   800  

Seagate   8   250  

Seagate   16   320  

Seagate   32   750  

DSN 11 24 3/13/12

Page 25: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Cache  Fingerprin-ng  Sequen-al  writes  may  be  ineffec-ve  at  flushing  –    Regardless  of  the  size  of  the  write  

 

A  number  of  random  writes  are  required  

DSN 11 25 3/13/12

 90  –  100%  

 70  -­‐  90    

 50  –  70  

 30  –  50  

 10  –  30  

 0  –  10  

Eviction Probability

Page 26: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Cache  Fingerprin-ng  Ver-cal  stripes  indicate  that  the  cache  is  segmented  –  Each  write,  regardless  of  size,  is  sent  to  one  segment  

DSN 11 26 3/13/12

 90  –  100%  

 70  -­‐  90    

 50  –  70  

 30  –  50  

 10  –  30  

 0  –  10  

Eviction Probability

Page 27: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Cache  Fingerprin-ng  Cache  behavior  of  disks  from  the  same  manufacturer  is  qualita-vely  similar  across  their  different  models    

DSN 11 27 3/13/12

 90  –  100%  

 70  -­‐  90    

 50  –  70  

 30  –  50  

 10  –  30  

 0  –  10  

Eviction Probability

Page 28: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Cache  Fingerprin-ng  

DSN 11 28 3/13/12

•  It’s  not  all  good  news  however:  –  Some  caches  appear  to  use  random  replacement  policies  

–  For  such  caches,  we  cannot  evict  blocks  with  100%  certainty  

– A  large  number  of  random  writes  are  required  to  get  high  evic-on  probability  

 

Page 29: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Cache  Fingerprin-ng  -­‐  Results  Drive   Number  

of  writes  Total  Data  (MB)  

Evic6on  Probability  

Time  (s)  

Hitachi  8  MB   1   2.38   100   0.05  

Hitachi  32  MB   1   11   100   0.087  

Seagate  8  MB   256   31   100   0.87  

Seagate  16  MB   128   17   100   0.342  

Seagate  64  MB   128   37   100   0.396  

Samsung  8  MB   128   49   ~  90   1.328  

Samsung  16  MB   256   128   ~  90   2.872  

Western  Digital  16  MB   1792   19   ~  90   5.107  

Western  Digital  64  MB   256   1   100   7.705  

DSN 11 29 3/13/12

Page 30: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Outline  

•  Mo-va-on  •  Background  •  Coerced  Cache  Evic-on  •  Cache  Fingerprin-ng  •  Discreet  Mode  Journaling  •  Evalua-on  •  Conclusion  

DSN 11 30 3/13/12

Page 31: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Discreet  Mode  Journaling  

•  Incorpora-ng  CCE  into  ext3  –  Fingerprint  the  disk  to  find  op-mal  flush  workload  –  Create  flush  zone  with  suitable  size  – Modify  ext3  to  issue  flush  zone  writes:  •  One  at  each  ordering  point  •  #  of  CCE  opera-ons  =  #  of  ordering  points  

•  Can  be  used  with  any  disk:  –   As  long  as  the  disk  is  fingerprinted  first  

DSN 11 31 3/13/12

Page 32: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Outline  

•  Mo-va-on  •  Background  •  Coerced  Cache  Evic-on  •  Cache  Fingerprin-ng  •  Discreet  Mode  Journaling  •  Evalua-on  •  Conclusion  

DSN 11 32 3/13/12

Page 33: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Evalua-on  •  Goal:  –  CCE  provides  higher  reliability  – At  what  cost?  Is  it  prac-cal  to  use?  

•  Experimental  setup:  –  File  system:  Ext3  – Disk:  Hitachi  8  MB  –  Journaling  mode:  Data  journaling  •  (See  paper  for  ordered  journaling  results)  

– Opera-ng  system:  Linux  2.6.13,  Linux  2.6.23  

DSN 11 33 3/13/12

Page 34: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Evalua-on  •  What  we  compare:  –  Regular  journaling  with  disk  cache  turned  off  •  “Safe”  but  slow  •  Disk  might  not  obey  command  to  turn  off  cache!  

–  Regular  journaling  with  disk  cache  turned  on  •  Unsafe  but  fast  

– Discreet  mode  journaling  • Midway  op-on  –  Safe  but  with  cost  

DSN 11 34 3/13/12

Page 35: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Evalua-on  

DSN 11 35 3/13/12

•  Benchmarks:  – OpenSSH  •   copy,  untar,  configure,  make  

–  Postmark  •  Simulates  a  mail  server  •  Single  threaded  

–  Filebench  Webserver  •   I/O  intensive  

–  Filebench  Varmail  • Mul-threaded  postmark  

Page 36: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Evalua-on  –  OpenSSH  

DSN 11 36

Data  Journaling  Mode  

3/13/12

0  5  

10  15  20  25  30  35  40  45  

Time  (s)  

regular  w/o  cache  discreet    regular  w/  cache  

Page 37: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Evalua-on  –  Postmark  

DSN 11 37

Data  Journaling  Mode  

3/13/12

0  100  200  300  400  500  600  700  800  

Time  (s)  

regular  w/o  cache  discreet    regular  w/  cache  

Page 38: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Evalua-on  –  Filebench  Webserver  

DSN 11 38

Data  Journaling  Mode  

3/13/12

0  

50  

100  

150  

200  

250  

300  

Throughtpu

t  (MB/s)  

regular  w/o  cache  discreet    regular  w/  cache  

Page 39: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Evalua-on  –  Filebench  Varmail  

DSN 11 39

Data  Journaling  Mode  

3/13/12

0  

2  

4  

6  

8  

10  

12  

Throughtpu

t  (MB/s)  

regular  w/o  cache  discreet    regular  w/  cache  

Page 40: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Evalua-on  –  Filebench  varmail  •  Workload  writes  a  small  amount  of  data  and  calls  fsync()  repeatedly  

•  Each  fsync()causes  3  CCEs  

•  Number  of  op-miza-ons  :  –  Incorporate  Group  Commit  in  varmail  •  Improves  throughput  for  all  modes  

– We  use  a  few  other  techniques  as  well  (see  paper)  

DSN 11 40 3/13/12

Page 41: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Evalua-on  –  Filebench  Varmail  

DSN 11 41

0  

5  

10  

15  

20  

25  

Throughp

ut  (M

B/s)  

With  op-miza-ons  

3/13/12

Original  performance  

0  

5  

10  

15  

20  

25  

Throughtpu

t  (MB/s)  

regular  w/o  cache  discreet    regular  w/  cache  

Page 42: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Summary  

•  Coerced  Cache  Evic-on  (CCE):  –   Run  file  systems  reliably  on  top  of  misbehaving  disks  

•  Characteriza-on  of  9  SATA  disk  caches  through  fingerprints  •  Discreet  Mode  Journaling:  –  Implementa-on  of  CCE  for  ext3  filesystem  – Acceptable  performance  on  3  workloads  •  Only  if  the  cache  doesn’t  use  random  replacement  

– High  overhead  for  apps  which  call  fsync()  frequently  

DSN 11 42 3/13/12

Page 43: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

Conclusion  

•  Trust  in  disk  is  weakening:  –  Latent  Sector  Errors  –  Block  corrup-on  –  Cache  flushing  

•  Cloud  compu-ng  systems:  –  Virtualized  hardware  –  Large  socware  stack  

•  Can  such  hardware  be  trusted?    •  Will  coercion  be  more  widely  used?  

DSN 11 43 3/13/12

Page 44: Coerced Cache Eviction: Dealing with Misbehaving Disks through Discreet-Mode Journaling

DSN 11 44

Thank you!

3/13/12

Advanced  Systems  Lab  (ADSL)  University  of  Wisconsin-­‐Madison  hEp://www.cs.wisc.edu/adsl