Quality of Service Implications of the Error Correction Techniques in Solid State Drives
Lorenzo Zuolo†, Cristian Zambelli†, Rino Micheloni‡, Stephen Bates‡ and Piero Olivo† †Dipartimento di Ingegneria – Università di Ferrara (Italy)
‡PMC-Sierra [email protected]
Solid State Drives (SSDs) are now the most effective solution for mass storage applications
• High performance• Read and write bandwidth up to GB/s• Million IOPS
• High robustness• No mechanical parts
• Low power consumption• Power wall at 25W
Reliability?SSD’s reliability is tightly coupled with that of the exploited storage system NAND FLASH MEMORIES
NAND Flash memories are subjected to a progressive wear-out due to endurance (program and erase cycles i.e., P/E cycles) and data retention
Endurance in 1X MLCRetention in 1X MLC(@ 10 kP/E Cycles)
A direct indicator of such wear-out is the RBER
Solution: To improve RBER figures thus lowering the percentage of uncorrectable pages, NAND flash vendors have introduced the Read Retry (RR) algorithm
Endurance in 1X MLCRetention in 1X MLC(@ 10 kP/E Cycles)
Take away: when RR is used, memory read time increases with respect to normal read time (up to 256% in tested device)
Problem: Increasing RBERs result in higher probability of erroneously decoding the bits read in a page (percentage of uncorrectable pages).
Take away 1: It is tightly coupled with the ECC’s correction capabilitiesTake away 2: As soon as it becomes higher than zero the whole SSD is
considered no longer reliable
Endurance in 1X MLCRetention in 1X MLC(@ 10 kP/E Cycles)
Take away: To improve SSD’s reliability advanced ECCs must be used
When looking at SSDs where Quality of Service (QoS) standards must be met, exploiting RR techniques could introduce a performance/reliability trade-off for the whole system
SSD’s bandwidth and latencies have been collected by means of the SSDExplorer co-simulation framework
RBER and percentage of uncorrectable pages
characteristics
host system command submission/completion
timings
Queue depth(1 Thread) Real Device SSDExplorer Matching
Read BW Read BW BW Delta
32 642691 KB/s 626112 KB/s -2.58%64 813615 KB/s 822295 KB/s 1.07%128 861639 KB/s 870096 KB/s 0.98%
Hardware/Software Co-simulation
Endurance in 1X MLC
Simulated architecture: 8 Channels, 4 targets per channel, 512GBECC: multi-threaded BCH 100bit/4320Bytes
Retention in 1X MLC(@ 10 kP/E Cycles)
SSD’s read latency distributions: minimum, 25th percentile, median, 75th percentile and maximum were measured.
An enterprise class QoS of 5ms has been set as reference
Conclusions:1. RR is able to enhance both endurance and data retention. In measured device endurance and retention were extended up to 31% and 300% respectively.2. SSD’s read performance and latencies were heavily impacted by RR. If QoS has to be guaranteed actual endurance and data retention extensions
provided by RR settle around 10% and 7% respectively.
SSD’s read bandwidth.
Endurance in 1X MLCRetention in 1X MLC(@ 10 kP/E Cycles)
QoS limit marks a 10% performance degradation compared to the beginning of life (endurance) and beginning of retention time.
ECC/SSD fail point (ECCTH)ECC/SSD fail point (ECCTH)