30
2017 Storage Developer Conference. © CNEX Labs. All Rights Reserved. 1 Open-Channel SSDs Offer the Flexibility Required by Hyperscale Infrastructure Matias Bjørling CNEX Labs

Open-Channel SSDs Offer the Flexibility Required by Hyperscale … · 2019. 12. 21. · 2017 Storage Developer Conference. © CNEX Labs. All Rights Reserved. 1 Open-Channel SSDs Offer

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

2017 Storage Developer Conference. © CNEX Labs. All Rights Reserved.1

Open-Channel SSDs Offer the Flexibility

Required by Hyperscale Infrastructure

Matias Bjørling

CNEX Labs

2017 Storage Developer Conference. © CNEX Labs. All Rights Reserved.2

Public and Private Cloud Providers

2017 Storage Developer Conference. © CNEX Labs. All Rights Reserved.3

Workloads and Applications

Hardware

Insta

nce

Insta

nce

Insta

nce

Insta

nce

Databases

Multi-Tenancy

Read or Write-Heavy

2017 Storage Developer Conference. © CNEX Labs. All Rights Reserved.4

NAND Capacity Grows Bigger

Performance – Endurance – DRAM overheads

2017 Storage Developer Conference. © CNEX Labs. All Rights Reserved.5

Read Latency with 0% WritesRandom Read 4K

Percentiles

Re

ad

La

tency

2017 Storage Developer Conference. © CNEX Labs. All Rights Reserved.6

Read Latency with 20% Writes

6

Percentiles

Random Read 4K + Random Write 4K

Signficant outliers!

Worst-case 30X

4ms!R

ead

La

tency

2017 Storage Developer Conference. © CNEX Labs. All Rights Reserved.7

Requirements

1. Flexible enough for software to evolve faster than hardware

2. Strong QoS Guarantees

3. Continuous access – No maintenance windows

4. Support a broad set of applications on shared hardware

5. Rapid enablement of new NAND generations / media agnostic

6. Vendor neutrality & supply chain diversity

2017 Storage Developer Conference. © CNEX Labs. All Rights Reserved.8

Open-Channel SSDs

I/O IsolationEnable I/O isolation

between tenants by

allocating your SSD into

separate parallel units.

Predictable LatencyNo more guessing when an

I/O completes. You know

which parallel unit is

accessed on disk.

Data Placement & I/O SchedulingManage your non-volatile memory

as a block device, through a file-

system or inside your application.

2017 Storage Developer Conference. © CNEX Labs. All Rights Reserved.9

Rebalance the Storage Interface

Expose internal lanes to host

Each lane is a parallel unit

Logical or Physical

Performance characteristics

Building Block

Similar to HDD SMR interface

Extension to NVMe

Can implement I/O Determinism,

streams, and other new data

management schemes

Open specification

2017 Storage Developer Conference. © CNEX Labs. All Rights Reserved.10

Rebalance the Storage Interface

Expose geometry

Logical/Physical geometry

Performance characteritics

Controller functionalities

Similar to HDD SMR interface

Exposes parallel units to the host

Enable efficient access to the parallel unitsEncode parallel units into

the address space

# Channels, # Parallel Units, #

Chunk, Chunk Size, Min. Write

size, Optimal Write size, …

LBAXLBA0PU0 PU1 PU2 PU3

10

Up to the SSD vendor

2017 Storage Developer Conference. © CNEX Labs. All Rights Reserved.11

Solid-State Drive

Parallel Units

Flash Translation Layer Channel X

Media Error Handling

Media Retention Management

Me

dia

Co

ntr

olle

rResponsibilities

Host Interface

Channel Y

Internals of an SSD

Read/Write/Erase

Read/Write

Tens of Parallel

Units!

Transforms

R/W/E to R/W

Manage Media

Constraints

ECC, RAID,

Retention

Read (50-100us)

Write (1-10ms)

Erase (3-15ms)

11

2017 Storage Developer Conference. © CNEX Labs. All Rights Reserved.12

Move Data Placement & I/O Scheduling to

the Host

Solid-State Drive

LUNs

Flash Translation Layer Channel X

Media Error Handling

Media Retention Management

Me

dia

Contr

ollerResponsibilities

Host Interface

Channel Y

Read (50-100us)

Write (1-10ms)

Erase (3-15ms)

12

2017 Storage Developer Conference. © CNEX Labs. All Rights Reserved.13

Move Data Placement & I/O Scheduling to

the Host

Solid-State Drive

LUNs

Flash Translation Layer

Channel X

Media Error Handling

Media Retention Management

Me

dia

Contr

ollerResponsibilities

Host Interface

Channel Y

Read (50-100us)

Write (1-10ms)

Erase (3-15ms)

13

2017 Storage Developer Conference. © CNEX Labs. All Rights Reserved.14

Parking Lot

2017 Storage Developer Conference. © CNEX Labs. All Rights Reserved.15

Unsupervised Parking Lot

15

2017 Storage Developer Conference. © CNEX Labs. All Rights Reserved.16

Supervised Parking Lot

16

1h 1h 24h

24h24h

1h 1h

1h

1h

24h 1h

2017 Storage Developer Conference. © CNEX Labs. All Rights Reserved.17

Open-Channel SSD 2.0 Specification

Expose a description of parallelism within SSD

Expose #Channels, #LUNs (PUs), #Chunks, Write sizes, …

Enhanced specification from customer requests and SSD vendors

Simplified

Log-structured. Write sequentially within a chunk

Minimal write size, optimal write size.

Media-agnostic

Work on NAND as well as PCM and other non-volatile memories

Exposed through similar constructs as Zones for SMR hard-drives

2017 Storage Developer Conference. © CNEX Labs. All Rights Reserved.18

Drive Model - Chunks

A chunk defines the area where writes must be sequential

Reads

Sector size

Writes

Minimum write size granularity

Synchronous – May fail – An error only marks write bad, and not the full SSD

Reset Chunk (Erase)

Synchronous – May fail – An error only marks chunk bad, and not the full SSD

0 1 LBA -1

0 1 … Chunk - 1

LBA

Chunk

2017 Storage Developer Conference. © CNEX Labs. All Rights Reserved.19

Drive Model - Organization

Parallelism exposed over

Channels (Shared bus)

LUNs – Parallel Units

Logical or Physical representation

0 1 … LBA -1

0 1 … Chunk - 1

LBA

Chunk

0 1 … LUN - 1

0 1 … Channel - 1

LUN

Channel

SSD

LUNLUNLUN

LUNLUNLUN

Chunk Chunk

NVMe

Host

2017 Storage Developer Conference. © CNEX Labs. All Rights Reserved.20

LightNVM Architecture

NVMe Device Driver

Detection of OCSSD

Implements OCSSD interface

LightNVM Subsystem

Core functionality – ”Partition manager”

Target management (e.g., pblk)

Sysfs integration

High-level I/O Interface

Block device using pblk

Application integration with liblightnvm

Open-Channel SSD

NVMe Device Driver

LightNVM Subsystem

pblk

Hardware

Kernel

Space

User

SpaceApplication(s)

File System

PPA Addressing

Sca

lar

Re

ad

/Write

(op

tio

na

l)

Ge

om

etr

y

Ve

cto

red

R/W

/E

(2)

(1)

(3)

20

2017 Storage Developer Conference. © CNEX Labs. All Rights Reserved.21

Host-side Flash Translation Layer - pblk

Multi-target support - I/O isolation

Fully associative L2P table (4KB mapping)

Host-side write buffer to guarantee read and writes

Cost-based garbage collector, using valid sector count as

metric.

Capacity based rate limiter. Designed as a function of

user and GC I/O present in write buffer.

Scan-based L2P recovery. Scan metadata in closed lines

and OOB on open lines.

sysfs interface for statistics and FTL tuning.

21

Open-Channel SSD

NVMe Device Driver

LightNVM Subsystem

Hardware

Linux

KernelFile System

make_rq make_rq

Write ThreadError Handling

L2P Table

Write Context

Write BufferWrite Entry

Write

LookupCache Hit

Read

Add Entry

GC/Rate-limitingThread

Read Path Write Path

2017 Storage Developer Conference. © CNEX Labs. All Rights Reserved.22

22

*Not final performance numbers

Raw Performance

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

MB

/s

Channels in use

Read/Write Throughput

Write (MB/s)

Read (MB/s)

2017 Storage Developer Conference. © CNEX Labs. All Rights Reserved.23

Predictable Latency

4K reads during 64K concurrent writes

Consistent low latency at 99.99, 99.999, 99.9999

2017 Storage Developer Conference. © CNEX Labs. All Rights Reserved.24

Demo using Open-Channel SSD

4K Reads 256K Writes

Open-Channel SSD

(Instance-based)

4K Reads 256K Writes

Solid State Drive

2017 Storage Developer Conference. © CNEX Labs. All Rights Reserved.25

25

2017 Storage Developer Conference. © CNEX Labs. All Rights Reserved.26

26

2017 Storage Developer Conference. © CNEX Labs. All Rights Reserved.27

Scalability

0

50

100

150

200

250

300

350

0 10 20 30 40 50 60 70 80 90 100

Lat

ency

(m

icro

seco

nds)

Percentiles

3 writes + 1 read 7 writes + 1 read 15 writes + 1 read

60 writes + 1 Read 120 Writes + 1 Read

2017 Storage Developer Conference. © CNEX Labs. All Rights Reserved.28

liblightnvm User-space interface to interact with vectoring using

Open-Channel SSDs

Easy to use interface for geometry layout and

I/O accesses

CLI Interface

struct nvm_dev

struct nvm_geo

struct nvm_addr

struct nvm_vblk

Terrific set of CLI tools

Great tutorial available

http://lightnvm.io/liblightnvm/tutorial/

28

Kernel-space

User-space

Open-Channel SSD

NVMe Device Driver

liblightnvm

I/O Apps

LightNVM Subsystem

2017 Storage Developer Conference. © CNEX Labs. All Rights Reserved.29

Status

Active community

Multiple drives in development by commercial SSD vendors

Multiple papers on Open-Channel SSDs

Growing software stack

LightNVM subsystem since Linux kernel 4.4.

User-space library (liblightnvm) support from Linux kernel 4.11.

pblk available from Linux kernel 4.12.

2017 Storage Developer Conference. © CNEX Labs. All Rights Reserved.30

Conclusion

New storage interface that solves the software-defined storage

challanges for hyperscalers

I/O Predictability

I/O Isolation

Application-level control of data placement and I/O scheduling

Draft specification is open and available for implementers

Formal standards initiative underway, target spec release 1H

2018

More information available at: http://lightnvm.io