Upload
duongthu
View
229
Download
0
Embed Size (px)
Citation preview
Introduction to Open-Channel Solid State Drivesand What’s Next!Matias BjørlingDirector, Solid-State System Software
September 25rd, 2018
Storage Developer Conference 2018, Santa Clara, CA ©2018 Western Digital Corporation or its affiliates. All rights reserved.
This presentation contains forward-looking statements that involve risks and uncertainties, including, but not limited to,
statements regarding our solid-state technologies, product development efforts, software development and potential
contributions, growth opportunities, and demand and market trends. Forward-looking statements should not be read as a
guarantee of future performance or results, and will not necessarily be accurate indications of the times at, or by, which
such performance or results will be achieved, if at all. Forward-looking statements are subject to risks and uncertainties
that could cause actual performance or results to differ materially from those expressed in or suggested by the forward-
looking statements.
Key risks and uncertainties include volatility in global economic conditions, business conditions and growth in the storage
ecosystem, impact of competitive products and pricing, market acceptance and cost of commodity materials and
specialized product components, actions by competitors, unexpected advances in competing technologies, difficulties or
delays in manufacturing, and other risks and uncertainties listed in the company’s filings with the Securities and Exchange
Commission (the “SEC”) and available on the SEC’s website at www.sec.gov, including our most recently filed periodic
report, to which your attention is directed. We do not undertake any obligation to publicly update or revise any forward-
looking statement, whether as a result of new information, future developments or otherwise, except as required by law.
Forward-Looking StatementsSafe Harbor | Disclaimers
2©2018 Western Digital Corporation or its affiliates. All rights reserved.
Motivation
Interface
Eco-system
What’s Next? Standardization
Agenda
3
1
2
3
4
©2018 Western Digital Corporation or its affiliates. All rights reserved.
0% Writes - Read Latency
4
I/O Percentiles
©2018 Western Digital Corporation or its affiliates. All rights reserved.
4K R
andom
Read L
ate
ncy
4K Random Read / 4K Random Write
20% Writes - Read Latency
5
4K R
andom
Read L
ate
ncy
4K Random Read / 4K Random Write
Signficant outliers!Worst-case 30X
4ms!
©2018 Western Digital Corporation or its affiliates. All rights reserved.
I/O Percentiles
NAND Chip Density Continues to GrowWhile Cost/GB decreases
6©2018 Western Digital Corporation or its affiliates. All rights reserved.
48
64
96
0
20
40
60
80
100
120
2015 2017 2018
3D NAND LayersSLC MLC TLC QLC
Workload #1
Workload #2 Workload #3
Workload #4
Ubiquitous WorkloadsEfficiency of the Cloud requires many different workloads of a single SSD
7©2018 Western Digital Corporation or its affiliates. All rights reserved.
Databases Sensors Analytics Virtualization Video
SSD
8
Solid State Drive Internals
Logical to PhysicalTranslation Map
Highly Parallel Architecture
Wear-Leveling Garbage Collection
Bad Block Management
Media ErrorHandling
…
Die0
Die1
Die2
Die3
Die0
Die1
Die2
Die3
Die0
Die1
Die2
Die3
Die0
Die1
Die2
Die3
Media Interface Controller
Read/WriteHost Interface (NVMe)
Read (50-100us)
Write (1-10ms)
Erase (3-15ms)
Read/Program/Erase
• NAND Read/Program/Erase
• Highly Parallel Architecture– Tens of Dies
• NAND Access Latencies
• Translation Layer– Logical to Physical Translation
Layer
– Wear-leveling
– Garbage Collection
– Bad block management
– Media error handling
– Etc.
• Read/Write/Erase -> Read/Write
Solid State Drive
©2018 Western Digital Corporation or its affiliates. All rights reserved.
Single-User WorkloadsIndirection and Indirect Writes causes outliers
9
1
2
3
Host: Log-on-Log Device: Indirect Writes
Solid-State Drive
Block Layer
Log-structured File-system
HW
Kernel
Space
User
Space
Log-structured Database (e.g., RocksDB)
Read/Write/Trim
pread/pwrite
Metadata Mgmt. Garbage CollectionAddress Mapping
Metadata Mgmt. Garbage CollectionAddress Mapping
Metadata Mgmt. Garbage CollectionAddress Mapping
VFSSolid-State Drive Pipeline
die2 die3die0 die1
NAND Controller
Reads
Write Buffer
Writes
ii
Unable to align data logically = Write
amplification increase + extra GC
IndirectWrites
Drive maps logical data to the physical location
using Best Effort
Host is oblivious to data placement due to the
indirection
Log-Structured
©2018 Western Digital Corporation or its affiliates. All rights reserved.
Open-Channel SSDs
10©2018 Western Digital Corporation or its affiliates. All rights reserved.
I/O Isolation Predictable Latency Data Placement &I/O Scheduling
DeviceWear-Leveling
Solid State Drive Internals
• Host Responsibility– Logical to Physical Translation Map
– Garbage Collection
– Logical Wear-leveling
• Hint to host to place hot/cold data
• Integration– Block device
• Host-side FTL that does L2P, GC, Logical Wear-leveling
• Similar overhead to traditional SSDs
– Applications
• Databases and File-systems
11
DeviceWear-Leveling
Bad Block Management
Media ErrorHandling
…
Die0
Die1
Die0
Die1
Die0
Die1
Die0
Die1
Logical to PhysicalTranslation Map
Media Interface Controller
Read/Program/Erase
LogicalWear-Leveling
Garbage Collection
Solid State Drive
NVMe Device Driver
©2018 Western Digital Corporation or its affiliates. All rights reserved.
Concepts in an Open-Channel SSD
• Chunks– Sequential write only LBA ranges
– Align writes to internal block sizes
• Hierarchical addressing– A sparse addressing scheme projected onto the NVMe™ LBA address space
• Host-assisted Media Refresh– Improve I/O predictability
• Host-assisted Wear-leveling– Improve wear-leveling
Interface Blocks
12©2018 Western Digital Corporation or its affiliates. All rights reserved.
Chunks #1
• A chunk is a range of LBAs where writes must be sequential. – Reduces DRAM for L2P table by orders of magnitude
– Hot/Cold data separation
• Rewrite requires a reset• A chunk can be in one of four states
(free/open/closed/offline)
• If a chunk is open, there is a write pointer associated.
• Same device model as the ZAC/ZBC standards.
• Similar device model to be standardized in NVMe (I’ll come back to this)
Enable orders of magnitude reduction of device-side DRAM
Chunk 0 Chunk 1 Chunk X
Namespace
0 ... Max LBAs©2018 Western Digital Corporation or its affiliates. All rights reserved.
Chunks #2
•Drive capacity divided into chunks
•Chunk types–Conventional• Random or Sequential
–Sequential Write Required • Chunk must be written sequential only
• Must be reset entirely before being rewritten
9/22/2018 14
Write pointerposition
Disk LBA range divided in chunks
Write commandsadvance the write pointer
Reset commandsrewind the write pointer
©2018 Western Digital Corporation or its affiliates. All rights reserved.
Hierarchical Addressing
• Expose device parallelism through Groups/Parallel Units• One or a group of dies are exposed as parallel units to the host
• Parallel units are a logical representation
Channels and Dies are mapped to Logical Groups and Parallel Units
15
0 1 …LBA -
1
0 1 … Chunk - 1
LBA
Chunk
0 1 … PU - 1
0 1 … Group - 1
PU
GroupSSD
LUNLUNPU
Groups(Channels)
Host
Group(s)
Ch
un
k
Ch
un
k
PU
NVMe Namespace
Ch
un
k
Ch
un
k
PU
Ch
un
k
Ch
un
k
PU
Ch
un
k
Ch
un
k
PU
(Dies) (Dies)
LUNLUNPU
Physical LBA Address Space Logical
©2018 Western Digital Corporation or its affiliates. All rights reserved.
Host-assisted Media Refresh
• SSDs refreshes its data periodically to maintain reliability. It does this through a data scrubbing process– Internal read and writes make the drive I/O
latencies unpredictable.
– Writes dominates I/O outliers
• 2-step Data Refresh– Device to only perform the data scrubbing read
part - Data movement is managed by host
– Increases predictability of the drive. Host managesrefresh strategy
• Should it refresh? Is there a copy elsewhere?
Enable host to assist SSD data refresh
16©2018 Western Digital Corporation or its affiliates. All rights reserved.
NAND Controller
die1
die0
die1
die0
die1
die0
die1
die0
OCSSD
Read-only Scrubbing
NVMe AER(Chunk Notification Entry)
Refresh Data(If necessary)
Step 1 Step 2
Host-assisted Wear-Leveling
• SSDs typically does not know the temperatureof newly written data– Placing hot and cold data together increases write
amplication
– Write amplication is often 4-5X for SSDs with no optimizations
• Chunk characteristics– Limited reset cycles (as NAND blocks has limited erase
cycles)
– Place cold data on chunks that are nearer end-of-lifeand use younger chunks for hot data
• Approach– Introduce per-chunk relative wear-level indicator (WLI)
– Host knows workload and places data w.r.t. to WLI
– Reduces garbage collection→ Increases lifetime, I/O latency, and performance
Enable host to separate Hot/Cold data to Chunks depending on wear
17Flash Memory Summit 2018, Santa Clara, CA©2018 Western Digital Corporation or its affiliates. All rights reserved.
Superblock (SB)
SSDWrite Seq.
GC SB1st GC
Host Writes
Hot SB Cold? SB2nd GC
Warm SB Cold SB3rd GC
SSD
Chunk X
Host Writes
Chunk Y
Chunk Z
WLI: 0
WLI: 33%
WLI: 90%
Interface SummaryThe concepts together provide
18©2018 Western Digital Corporation or its affiliates. All rights reserved.
I/O Isolation through the use of Groups & Parallel Units
DRAM & Over-provisioning reduction through append-only Chunks
Direct-to-media to avoid expensive internal data movement
Fine-grained data refresh managed by the host
Reduce write amplification by enabling host to place hot/cold data efficiently
Specification available at http://lightnvm.io
Eco-system
• Linux Kernel®– NVMe Device Driver
• Detection of OCSSDs
• Support for 1.2 and 2.0 specification
• Register with LightNVM subsystem
• Register as a Zoned Block Devices (patches available)
– LightNVM Subsystem• Core functionality
• Target management
– Target interface• Enumerate, get geometry, I/O interface, etc.
• pblk host-side FTL – Map OCSSD to Block Device
• User-space– libzbc, fio (ZBD support), liblightnvm
– SPDK
Large eco-system through Zoned Block Devices and OCSSD
19
OCSSD2
ApplicationsApplications
liblightnvm
Logical Block Device (pblk)
File-System with SMR Support (f2fs, btrfs)
Regular File-Systems (xfs)
User Space
LinuxKernel
Block Layer
NVMe Driver
LightNVMSubsystemF
lexib
le I
O T
este
r (f
io)
©2018 Western Digital Corporation or its affiliates. All rights reserved.
Open-Source Software Contributions
• Initial release of subsystem with Linux kernel 4.4 (January 2016).
• User-space library (liblightnvm) support upstream in Linux kernel 4.11 (April 2017).
• pblk available in Linux kernel 4.12 (July 2017).
• Open-Channel SSD 2.0 specification released (January 2018) and support available from Linux kernel 4.17 (May 2018).
• SPDK Support for OCSSD (June 2018)
• Fio with Zone support (August 2018)
• Upcoming– OCSSD as a Zoned Block Device (Patches available)
– RAIL – XOR support for lower latency
– 2.0a revision
9/19/2018 · 20©2018 Western Digital Corporation or its affiliates. All rights reserved.
LightNVM: The Linux Open-Channel SSD Subsystem
https://www.usenix.org/conference/fast17/technical-sessions/presentation/bjorling
LightNVMhttp://lightnvm.io
LightNVM Linux kernel Subsystemhttps://github.com/OpenChannelSSD/linux
liblightnvmhttps://github.com/OpenChannelSSD/liblightnvm
QEMU NVMe with Open-Channel SSD Supporthttps://github.com/OpenChannelSSD/qemu-nvme
21
Tools and Libraries
©2018 Western Digital Corporation or its affiliates. All rights reserved.
Storage Developer Conference 2018, Santa Clara, CA©2018 Western Digital Corporation or its affiliates. All rights reserved. 22
Western Digital and the Western Digital logo are registered trademarks or trademarks of Western Digital Corporation or its affiliates in the US and/or other countries. Linux® is the registered trademark of Linus Torvalds in the U.S. and other countries. The NVMeword mark is a trademark of NVM Express, Inc. All other marks are the property of their respective owners.