Upload
beverly-lyons
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
FLASH Mitigation Strategies for Space Applications
Charles HowardSouthwest Research Institute
2FLASH Mitigation Strategies for Space Applications
Abstract
The MMS mission requires a high density non-volatile solid state recorder. The SSR will be implemented with screened commercial FLASH devices, characterized for radiation effects (both TID and SEE). In an extensive collaborative effort by NEPP and SWRI, multiple manufacturers and devices have been characterized. The additional SEU failure modes exhibited by FLASH devices compel mitigation techniques to extend beyond the traditional bit error correction. A discussion of mitigation techniques and tradeoffs between FPGA complexity/utilization, bandwidth and total memory will be presented .
3FLASH Mitigation Strategies for Space Applications
Why FLASH?
“I am your density” – George McFly, Back to the Future
SDRAM– 512Mx8 in an MCM?
SRAM– Yeah, right…
FLASH – 512Mx8 discrete parts
1Gx8 available
– 4Gx8 MCMs (8Gx8 possible)– NON-VOLATILE…
4FLASH Mitigation Strategies for Space Applications
Why NOT FLASH?
Space qualified parts?– General availability sorely lacking– No Rad foundry providing FLASH
Legacy / Lack thereof– Radiation testing of commercial products is a
strenuous process…– Each wafer lot must be tested– “Long term” availability for commodity parts?
NOT!
5FLASH Mitigation Strategies for Space Applications
NEPP/SWRI testing of FLASH
SEE response is generally excellent for all flash products– Error cross-sections orders of magnitude lower than
for standard volatile memories None of the parts suffered SEL
– There were other destructive effects, usually failure of the erase circuit.
The SEFI rate is a concern with flash memories.– What do you call a SEFI that won’t clear after a
power cycle?
6FLASH Mitigation Strategies for Space Applications
FLASH Memory in Space Environment
“The SEFI (Single Event Functional Interrupt) rate is of greater concern for space applications than the bit error rate”– TID and SEE Response of Advanced 4G NAND Flash
Memories NSREC08, T.R. Oldham
7FLASH Mitigation Strategies for Space Applications
Mitigation Considerations
Class of Error– SEUs– SEL– SEFI– “Permanent” SEFI
Cost of implementation/mitigation– Area – Mass– Power– Required FPGA logic
8FLASH Mitigation Strategies for Space Applications
Error Classes
SEU– Address to satisfy MAR
Some form of ECC SEL
– Sufficiently low to neglect Component design issue
SEFI (part becomes nominal after power cycle/reset)– More likely than SEU, must address
Detect & power cycle/reset Permanent SEFI
– More likely than SEU, must address– Different mitigation approach!
???
9FLASH Mitigation Strategies for Space Applications
Module Topology
4 Gigabyte Module
FLASH 4Gx8
(512Mx8, x8
PowerControl
Ctrl+Data
FLASH512Mx8
FLASH512Mx8
FLASH512Mx8
FLASH512Mx8
FLASH512Mx8
FLASH512Mx8
4GByte Module
FLASH512Mx8
FLASH512Mx8
10FLASH Mitigation Strategies for Space Applications
CAVEAT STATEMENTS
I am not doing the probability calculations Consider a DWORD storage system for reference Permanent SEFIs are not recoverable:
– Loss of Erase, Write or Read Circuit– Can approximate the loss of a component
Block based failures and permanent SEFIs are roughly equivalent– Lose a “unit” of data (BLOCK x 4 x n) ~
“component” Simple addressing and memory management
– No exotic stuff like link listing
11FLASH Mitigation Strategies for Space Applications
Design Options
UnmitigatedSEC/DED (Traditional EDAC)Reed-SolomonParallel Reed-SolomonTMRRedundancyECC “Plus”
12FLASH Mitigation Strategies for Space Applications
Unmitigated
0% more memory– Area / Power / Mass 1x
Implementation concerns– Addressing scheme Simple– Memory management metrics Simple
Utilization -- logic required to implement– I/O count 1x– Gates Baseline
Susceptibility– Bit Any Single Bit Error– Byte or component NOPE…
13FLASH Mitigation Strategies for Space Applications
SEC/DED
25% more memory– Area / Power / Mass 1.25x
Implementation concerns– Addressing scheme Simple– Memory management metrics Simple
Utilization -- logic required to implement– I/O count 1.25x– Gates Hamming cost
Susceptibility (Immunity)– Bit Any Single Bit Error– Byte or component NOPE…
14FLASH Mitigation Strategies for Space Applications
Reed Solomon (Block)
25% more memory – Area / Power / Mass 1.25x
Implementation concerns– Addressing scheme Straightforward– Memory management metrics Simple
Utilization -- logic required to implement– I/O count 1.25x– Gates Encoder/Decoder/RAM– Bandwidth Likely Adverse
Susceptibility (Immunity)– Bit, byte Many/codeblock – Component failures NOPE…
15FLASH Mitigation Strategies for Space Applications
Parallel Reed Solomon
50% more memory – Area / Power / Mass 1.5x
Implementation concerns– Addressing scheme Simple– Memory management metrics
Utilization -- logic required to implement– I/O count 1.5x– Gates Encoder/Decoder
Susceptibility (Immunity) – Bit, byte, byte “plus” YEAH!– SOME component failures 2/3 (NOT IN
THE RS)
16FLASH Mitigation Strategies for Space Applications
TMR
200% more memory– Area / Power / Mass 3x
Implementation concerns– Addressing scheme Simple– Memory management metrics Simple
Utilization -- logic required to implement– I/O count 3X or TDM– Bus loading / signal integrity Ouch…– Gates Voters (plus)
Susceptibility (Immunity) – Bit, byte or component OH, YEAH! We
can handle anything!
17FLASH Mitigation Strategies for Space Applications
Redundant Memory
X% more memory– Area / Power / Mass X
Implementation concerns– Addressing scheme Simple– Memory management metrics Simple
Utilization -- logic required to implement– I/O count X– Gates Minimal
Susceptibility (Immunity) – Bit, byte or component Nope.
18FLASH Mitigation Strategies for Space Applications
ECC with Warm Spare
25-50% more memory per dword– Area / Power / Mass 1.5x
Implementation concerns– Addressing scheme Simple– Memory management metrics
Straightforward Utilization -- logic required to implement
– I/O count 1.5x– Bus loading / signal integrity– Gates ECC & steering
Susceptibility (Immunity) – Bit, byte or component OH, YEAH! We
can handle anything!
19FLASH Mitigation Strategies for Space Applications
Memory Topology
Power Sector(16GB + EDAC)
PowerControl
Power Sector(16GB + EDAC)
PowerControl
Ctrl+Data
BYTE0
BYTE2
ECC
BYTE1
BYTE3
SPARE
Revised 07/25/09
Power Sector(16GB + EDAC)
PowerControl
Mass Memory
Flash Array(48GB + EDAC)
20FLASH Mitigation Strategies for Space Applications
Failure 1
Power Sector(16GB + EDAC)
PowerControl
Power Sector(16GB + EDAC)
PowerControl
Ctrl+Data
BYTE0
BYTE2
ECC
BYTE1
BYTE3
BYTE0
Revised 07/25/09
Power Sector(16GB + EDAC)
PowerControl
Mass Memory
Flash Array(48GB + EDAC)
21FLASH Mitigation Strategies for Space Applications
Failure 2
Power Sector(16GB + EDAC)
PowerControl
Power Sector(16GB + EDAC)
PowerControl
Ctrl+Data
BYTE0
BYTE2
ECC
BYTE1
BYTE3
ECC
Revised 07/25/09
Power Sector(16GB + EDAC)
PowerControl
Mass Memory
Flash Array(48GB + EDAC)
22FLASH Mitigation Strategies for Space Applications
Observations
ECC covers SEU errors Warm Spare compensates for SEFIs and block errors ECC with Warm Spare is a superior option
– Susceptibility to permanent SEFIs plummets– Memory availability remains near 100%
Block based errors mapped to spare SEFI based errors map to spare
ECC with Warm Spare is roughly equivalent to full TMR at half the power, mass, area, and cost
23FLASH Mitigation Strategies for Space Applications
Summary
Memory modules allow highest density/area Mitigation is user’s choice depending upon
design goals but must cover SEFI and SEU ECC with Warm Spare is roughly equivalent to
full TMR at half the power, mass, area, and cost
TID and SEE Response of an Advanced Samsung 4Gb NAND Flash Memory (NSREC07); T. R. Oldham, M. Friendlich, J. W. Howard, Jr., M. D. Berg, H. S. Kim, T. L. Irwin, and K. A. LaBel
TID and SEE Response of Advanced 4G NAND Flash Memories (NSREC08); T. R. Oldham, Fellow, IEEE, M. Suhail, M. R. Friendlich, M. A. Carts, R.L. Ladbury, Member, IEEE, H. S. Kim, M. D. Berg, C. Poivey, Member, IEEE, S. P. Buchner, Member, IEEE, A. B. Sanders, C. M. Seidleck, and K. A. LaBel, Member, IEEE
SEE and TID of Emerging Non-Volatile Memories; D.N. Nguyen and L.Z. Scheick, Jet Propulsion Laboratory California Institute of Technology, http://parts.jpl.nasa.gov/docs/PID16621.pdf
A Case Study of Single Event Functional Interrupts (SEFIs) in COTS SDRAMS (NSREC08); Joe Benedetto and George Ott, Radiation Assured Devices