1
Quality of Service Implications of the Error Correction Techniques in Solid State Drives Lorenzo Zuolo †, Cristian Zambelli†, Rino Micheloni‡, Stephen Bates‡ and Piero Olivo† †Dipartimento di Ingegneria – Università di Ferrara (Italy) ‡PMC-Sierra [email protected] Solid State Drives (SSDs) are now the most effective solution for mass storage applications High performance Read and write bandwidth up to GB/s Million IOPS High robustness No mechanical parts Low power consumption Power wall at 25W Reliabi lity? SSD’s reliability is tightly coupled with that of the exploited storage system NAND FLASH MEMORIES NAND Flash memories are subjected to a progressive wear-out due to endurance (program and erase cycles i.e., P/E cycles) and data retention Endurance in 1X MLC Retention in 1X MLC (@ 10 kP/E Cycles) A direct indicator of such wear- out is the RBER Solution: To improve RBER figures thus lowering the percentage of uncorrectable pages, NAND flash vendors have introduced the Read Retry (RR) algorithm Endurance in 1X MLC Retention in 1X MLC (@ 10 kP/E Cycles) Take away: when RR is used, memory read time increases with respect to normal read time (up to 256% in tested device) Problem: Increasing RBERs result in higher probability of erroneously decoding the bits read in a page (percentage of uncorrectable pages ). Take away 1: It is tightly coupled with the ECC’s correction capabilities Take away 2: As soon as it becomes higher than zero the whole SSD is considered no longer reliable Endurance in 1X MLC Retention in 1X MLC (@ 10 kP/E Cycles) Take away: To improve SSD’s reliability advanced ECCs must be used When looking at SSDs where Quality of Service (QoS) standards must be met, exploiting RR techniques could introduce a performance/reliability trade-off for the whole system SSD’s bandwidth and latencies have been collected by means of the SSDExplorer co- simulation framework RBER and percentage of uncorrectable pages characteristics host system command submission/comple tion timings Queue depth (1 Thread) Real Device SSDExplore r Matchi ng Read BW Read BW BW Delta 32 642691 KB/s 626112 KB/s -2.58% 64 813615 KB/s 822295 KB/s 1.07% 128 861639 KB/s 870096 KB/s 0.98% Hardware/Software Co- simulation Endurance in 1X MLC Simulated architecture: 8 Channels, 4 targets per channel, 512GB ECC: multi-threaded BCH 100bit/4320Bytes Retention in 1X MLC (@ 10 kP/E Cycles) SSD’s read latency distributions: minimum, 25 th percentile, median, 75 th percentile and maximum were measured. An enterprise class QoS of 5ms has been set as reference Conclusions: 1. RR is able to enhance both endurance and data retention. In measured device endurance and retention were extended up to 31% and 300% respectively. 2. SSD’s read performance and latencies were heavily impacted by RR. If QoS has to be guaranteed actual endurance and data retention extensions provided by RR settle around 10% and 7% respectively. SSD’s read bandwidth. Endurance in 1X MLC Retention in 1X MLC (@ 10 kP/E Cycles) QoS limit marks a 10% performance degradation compared to the beginning of life (endurance) and beginning of retention time. ECC/SSD fail point (ECC TH ) ECC/SSD fail point (ECC TH )

Controlling Qos Metrics in NVMe SSDs

Embed Size (px)

Citation preview

Page 1: Controlling Qos Metrics in NVMe SSDs

Quality of Service Implications of the Error Correction Techniques in Solid State Drives

Lorenzo Zuolo†, Cristian Zambelli†, Rino Micheloni‡, Stephen Bates‡ and Piero Olivo† †Dipartimento di Ingegneria – Università di Ferrara (Italy)

‡PMC-Sierra [email protected]

Solid State Drives (SSDs) are now the most effective solution for mass storage applications

• High performance• Read and write bandwidth up to GB/s• Million IOPS

• High robustness• No mechanical parts

• Low power consumption• Power wall at 25W

Reliability?SSD’s reliability is tightly coupled with that of the exploited storage system NAND FLASH MEMORIES

NAND Flash memories are subjected to a progressive wear-out due to endurance (program and erase cycles i.e., P/E cycles) and data retention

Endurance in 1X MLCRetention in 1X MLC(@ 10 kP/E Cycles)

A direct indicator of such wear-out is the RBER

Solution: To improve RBER figures thus lowering the percentage of uncorrectable pages, NAND flash vendors have introduced the Read Retry (RR) algorithm

Endurance in 1X MLCRetention in 1X MLC(@ 10 kP/E Cycles)

Take away: when RR is used, memory read time increases with respect to normal read time (up to 256% in tested device)

Problem: Increasing RBERs result in higher probability of erroneously decoding the bits read in a page (percentage of uncorrectable pages).

Take away 1: It is tightly coupled with the ECC’s correction capabilitiesTake away 2: As soon as it becomes higher than zero the whole SSD is

considered no longer reliable

Endurance in 1X MLCRetention in 1X MLC(@ 10 kP/E Cycles)

Take away: To improve SSD’s reliability advanced ECCs must be used

When looking at SSDs where Quality of Service (QoS) standards must be met, exploiting RR techniques could introduce a performance/reliability trade-off for the whole system

SSD’s bandwidth and latencies have been collected by means of the SSDExplorer co-simulation framework

RBER and percentage of uncorrectable pages

characteristics

host system command submission/completion

timings

Queue depth(1 Thread) Real Device SSDExplorer Matching

Read BW Read BW BW Delta

32 642691 KB/s 626112 KB/s -2.58%64 813615 KB/s 822295 KB/s 1.07%128 861639 KB/s 870096 KB/s 0.98%

Hardware/Software Co-simulation

Endurance in 1X MLC

Simulated architecture: 8 Channels, 4 targets per channel, 512GBECC: multi-threaded BCH 100bit/4320Bytes

Retention in 1X MLC(@ 10 kP/E Cycles)

SSD’s read latency distributions: minimum, 25th percentile, median, 75th percentile and maximum were measured.

An enterprise class QoS of 5ms has been set as reference

Conclusions:1. RR is able to enhance both endurance and data retention. In measured device endurance and retention were extended up to 31% and 300% respectively.2. SSD’s read performance and latencies were heavily impacted by RR. If QoS has to be guaranteed actual endurance and data retention extensions

provided by RR settle around 10% and 7% respectively.

SSD’s read bandwidth.

Endurance in 1X MLCRetention in 1X MLC(@ 10 kP/E Cycles)

QoS limit marks a 10% performance degradation compared to the beginning of life (endurance) and beginning of retention time.

ECC/SSD fail point (ECCTH)ECC/SSD fail point (ECCTH)