24
UC Santa Cruz Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J. * Scott A. Brandt, Darrell D. E. Long Storage Systems Research Center University of California,

UC Santa Cruz Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J. * Scott A. Brandt, Darrell D. E. Long Storage Systems Research

Embed Size (px)

Citation preview

Page 1: UC Santa Cruz Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J. * Scott A. Brandt, Darrell D. E. Long Storage Systems Research

UC Santa Cruz

Reliability of MEMS-Based Storage Enclosures

Bo Hong, Thomas J. E. Schwarz, S. J.*

Scott A. Brandt, Darrell D. E. Long

Storage Systems Research Center

University of California, Santa Cruz

*Also Santa Clara University, Santa Clara, CA

Page 2: UC Santa Cruz Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J. * Scott A. Brandt, Darrell D. E. Long Storage Systems Research

2

MEMS Storage Technology Micro-Electro-Mechanical Systems (MEMS) storage

• A promising alternative secondary storage technology• Hardware Research: IBM, HP, CMU, Nanochip• Magnetic storage, but very different mechanics

Spring

Page 3: UC Santa Cruz Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J. * Scott A. Brandt, Darrell D. E. Long Storage Systems Research

3

MEMS Storage Technology MEMS-based storage vs. Magnetic Disk

• Provides non-volatile storage, too.• Delivers 10 * faster access time (< 1 ms)• Delivers higher bandwidth (100 MB – 1 GB/s)• Small (size of penny, cent)• Consumes 100* less power• Costs ~10 USD per device• Expected to be more reliable

• Stores limited amount of data per device (3-10 GB)

A serious alternative to disk drives, in particular for mobile computing applications

Page 4: UC Santa Cruz Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J. * Scott A. Brandt, Darrell D. E. Long Storage Systems Research

4

Reliability Implication of MEMS-based Storage Storage systems built from MEMS-based storage …

• Require more MEMS devices At least 10 times the number of disks to meet capacity requirements

• Require more connection components Reliability implication

• More components, hence (?) lower reliability

Page 5: UC Santa Cruz Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J. * Scott A. Brandt, Darrell D. E. Long Storage Systems Research

5

MEMS Storage Enclosure

Our proposal: MEMS Enclosures• A device with dozens of MEMS• Single interface to rest of system• Might be serviceable, but service calls

during economic lifetime should be very rare

Interface

Page 6: UC Santa Cruz Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J. * Scott A. Brandt, Darrell D. E. Long Storage Systems Research

6

MEMS Storage Enclosures Reliability an issue:

• MTTF 1- 2 years without redundant data storage Uses RAID Level 5 technology with distributed sparing

• Additional k spares Calls for service when necessary

• i.e. when we run out of spares Organization and number of spares can

• Decrease the data recovery time and thus improve reliability• Reduce human interference

No errors servicing Reduce maintenance costs

Page 7: UC Santa Cruz Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J. * Scott A. Brandt, Darrell D. E. Long Storage Systems Research

7

MEMS Enclosure Reliability

Measure MTBF for enclosures • Without replacing spares• With replacing spares (service calls)

Determine number of failures that trigger a service call

Mandatory replacement: no redundancy left Preventive replacement: no spare left

Page 8: UC Santa Cruz Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J. * Scott A. Brandt, Darrell D. E. Long Storage Systems Research

8

MEMS Enclosure Reliability without Replacement

Disk23 Yrs

3 spares5.8 Yrs

2 spares4.6 Yrs

No spare2.3 Yrs

1 spare3.5 Yrs

4 spares6.9 Yrs

Disk11.5 Yrs

5 spares8.1 Yrs

MTTFDISK = 11.5 or 23 yrs

MTTFMEMS = 23 yrs 19 data + 1 parity + k

dedicated spares 15-minute data

recovery

MTTF is not enough to measure reliability of enclosures without repairs

Instead: focus on data reliability during the economic lifetimes (3-5 years) of enclosures

Page 9: UC Santa Cruz Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J. * Scott A. Brandt, Darrell D. E. Long Storage Systems Research

9

MEMS Enclosures with Replacement Markov model for a MEMS enclosure with N data,

one parity, and one dedicated spare devices• N – Normal; D – Degraded; DL – Data Loss• 1/ – MTTFMEMS (in tens of years)• 1/µ – Mean Time Between Recovery (in minutes)• 1/ – Mean Time Between Replacement (in days, weeks)

Preventive and mandatory replacement

Preventive replacement

Mandatory replacement

Page 10: UC Santa Cruz Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J. * Scott A. Brandt, Darrell D. E. Long Storage Systems Research

10

MEMS Enclosure Reliability with Replacement

Preventive replacement increases reliability and reduces replacement urgency

No spare

Preventive + mandatory

Mandatory21

3

3

1 2

1, 2, 3 – Number of spares

Page 11: UC Santa Cruz Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J. * Scott A. Brandt, Darrell D. E. Long Storage Systems Research

11

MEMS Enclosure Reliability

Dedicated Sparing• Replace all data from a failed MEMS

on a single spare MEMS Distributed Sparing

• Every spare contains Client data Parity data Spare space

Page 12: UC Santa Cruz Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J. * Scott A. Brandt, Darrell D. E. Long Storage Systems Research

12

Distributed Sparing [Menon and Mattson 1992]

Before failure

X

Shorter data recovery time More devices can fail

After MEMS 4 fails

Page 13: UC Santa Cruz Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J. * Scott A. Brandt, Darrell D. E. Long Storage Systems Research

13

Reliability Comparison: Dedicated Sparing vs. Distributed Sparing

No spare

Preventive + mandatory

MandatoryDedicated

Dedicated

1

2

2

1

1, 2– Number of spares

Compare with following slide

Page 14: UC Santa Cruz Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J. * Scott A. Brandt, Darrell D. E. Long Storage Systems Research

14

Reliability Comparison: Dedicated Sparing vs. Distributed Sparing

Distributed sparing only better at short replacement times when using preventive replacement

No spare

Dedicated &Distributed

Dedicated

Distributed

1

2

2

1

1, 2– Number of spares

Preventive + mandatory

Mandatory

Page 15: UC Santa Cruz Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J. * Scott A. Brandt, Darrell D. E. Long Storage Systems Research

15

All about economy• How long can MEMS enclosures work without repairs?• How often do they need repairing in the first 3-5 years?• How does replacement policies affect maintenance

frequency?

# of failures an enclosure with k spares can tolerate before the (m+1)th repair is scheduled (m >= 0):• (m + 1) × k, under the preventive replacement policy• (m + 1) × (k + 1), under the mandatory replacement

policy

Durability of MEMS Storage Enclosures

Page 16: UC Santa Cruz Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J. * Scott A. Brandt, Darrell D. E. Long Storage Systems Research

16

Durability of MEMS Storage Enclosures

Probabilities that a MEMS storage enclosure has up to k failure during (0, t]

2 failures

4 failures

6 failures

1 failure

Disk23 Yrs

No failure

8 failures

10 failures

First year survivability: 95.7% of disk vs. 98.8% of MEMS enclosures with two spares

Chance that MEMS enclosure with four spares requires more than one service in five years: 3.5% (preventive) vs. 0.6% (mandatory)

Page 17: UC Santa Cruz Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J. * Scott A. Brandt, Darrell D. E. Long Storage Systems Research

17

Related Work MEMS-based storage technology development

• IBM, HP, CMU CHI2PS, Nanochip Digital Micromirror Devices by TI

• Reported Mean Time Between Failure: 650,000 hours [Douglass]

RAID reliability• Dedicated sparing [Dunphy et al.]• Distributed sparing [Menon and Mattson]• Parity sparing [Reddy and Banerjee]

Disk failure prediction• S.M.A.R.T (Self-Monitoring Analysis and Reporting

Technology)

Page 18: UC Santa Cruz Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J. * Scott A. Brandt, Darrell D. E. Long Storage Systems Research

18

Summary Reliability of MEMS storage enclosures

• Can be more reliable than disks even without failed device replacement

• Highly reliable when using preventive replacement • Dedicated sparing and distributed sparing provide

comparable or almost identical reliability Economy of MEMS storage enclosures

• Preventive replacement trades more maintenance services for higher reliability

Page 19: UC Santa Cruz Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J. * Scott A. Brandt, Darrell D. E. Long Storage Systems Research

19

Thank You! Acknowledgements

• Dave Nagle, Greg Ganger, CMU PDL• The rest of the UCSC SSRC

More information:• http://ssrc.cse.ucsc.edu• http://ssrc.cse.ucsc.edu/mems.shtml

Questions?

Page 20: UC Santa Cruz Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J. * Scott A. Brandt, Darrell D. E. Long Storage Systems Research

20

Backup Slides

Page 21: UC Santa Cruz Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J. * Scott A. Brandt, Darrell D. E. Long Storage Systems Research

21

MEMS Storage Technology Micro-Electro-Mechanical Systems (MEMS) storage

• A promising alternative secondary storage technology• Hardware Research: IBM, HP, CMU, Nanochip

Radical differences between MEMS storage and magnetic disk technologies

Disk MEMSRecoding

mediaMagnetic

Magnetic or physical(non-volatile)

Recoding technique

LongitudinalOrthogonal

(higher density)

R/W head SingleThousands – tip array

(Higher bandwidth and parallelism)

Media movement

RotationMedia sled moves in X and Y independently

(no rotation delay)

Page 22: UC Santa Cruz Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J. * Scott A. Brandt, Darrell D. E. Long Storage Systems Research

22

MEMS Storage Device Characteristics Physical size: 1 – 2 cm2

Recording density: 250 – 750 Gb/in2

7GB/s

1ns 10ns 100ns 1us 10us 100us 1ms 10ms

1GB/s

2GB/s

3GB/s

4GB/s

5GB/s

6GB/s

Th

rou

ghp

ut

DRAM

DISK

MEMS

Predicted Performance in 2005

Access Latency

0.5–2 GB$100-$200/

GB

3–10 GB$5-$50/

GB100–500

GB$1-$2/GB

Page 23: UC Santa Cruz Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J. * Scott A. Brandt, Darrell D. E. Long Storage Systems Research

23

MEMS Storage Device

Spring

X

Y

Page 24: UC Santa Cruz Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J. * Scott A. Brandt, Darrell D. E. Long Storage Systems Research

24

Durability of MEMS Storage Enclosures

Probabilities that a MEMS storage enclosure has up to k failure during (0, t]

2 failures

4 failures

6 failures

1 failure

Disk23 Yrs

No failure

8 failures

10 failures