35
Dynamic Near Data Processing Framework for SSDs Gunjae Koo *, Kiran Kumar Matam*, Te I , H.V. Krishina Giri Nara*, Jing Li , Hung-Wei Tseng , Steven Swanson , Murali Annavaram* *University of Southern California North Carolina State University University of California, San Diego

Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

Dynamic Near Data Processing Framework for SSDs

Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. Krishina Giri Nara*, Jing Li‡,Hung-Wei Tseng†, Steven Swanson‡, Murali Annavaram*

*University of Southern California†North Carolina State University

‡University of California, San Diego

Page 2: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

Conventional Storage = Cheap Passive Devices

2

Conventional storage devices• Slow, limited bandwidth (SATA 150 ~ 600 MB/s) • Passive devices (read, write, erase)

* Figures from Intel and Western Digital

Page 3: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

Storage in Modern Server Systems

3

Storage devices for Big Data• Huge volumes of data slow, slower, much slower• Data movement is critical for performance

Page 4: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

Intelligent Storage

4

NVM-based storage devices• No seek time, higher bandwidth over PCIe• Potential to be active systems

* Figures from Intel

Page 5: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

Intelligent Storage

5

NVM-based storage devices• No seek time, higher bandwidth (PCIe)• Potential to be active systems

* Figures from Intel

SSDProcessor

DRAM

NAND flash packages

Page 6: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

StorageProcessor

(SP)

Host

Near Data Processing (NDP)

6

CPU Storage interface

Data computation @ host Data transfer from storage

InternalExternal (host – storage)

Page 7: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

Host

CPU

Near Data Processing (NDP)

7

Storage interface

StorageProcessor

(SP)

Data computation @ host Data transfer from storage

InternalExternal (host – storage)

W/O NDP

With NDPData computation @ storage

Page 8: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

Host

Near Data Processing (NDP) on SSDs

8

CPU Storage interface SP

Data computation @ host Data transfer from storage

InternalExternal (host – storage)

W/O NDP

With NDPData computation @ storage

Garbage collection

Wear-leveling

Data computation @ storage

Page 9: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

Host

Near Data Processing (NDP) on SSDs

9

CPU Storage interface SP

Data computation @ host Data transfer from storage

InternalExternal (host – storage)

W/O NDP

With NDP

Garbage collection

Wear-leveling

Data computation @ storage

Obstacles to in-SSD processing

• Less powerful embedded processor

• Dynamic computation resource availability

• Manual workload partitioning is difficult Summarizer: Dynamic NDP framework for SSD

Page 10: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

Host

CPU

Summarizer –Basic Concept

10

Storage interface AP

Monitoring resources

Page 11: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

Host

CPU

Summarizer –Basic Concept

11

Storage interface AP

Monitoring resources

Page 12: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

Summarizer –Detailed Firmware Architecture

12

Host Memory

SQ CQ

Host CPU

Sto

rag

e I

nte

rfa

ce (P

CIe

/ N

VM

e)

SSD Firmware

NAND FlashNAND FlashNAND FlashNAND Flash

Flash Controller

SSD DRAM

DRAM Controller

Summarizer

User Functions

TQ

Re

qu

est

qu

eu

e

Re

spo

nse

qu

eu

e

I/O Controller(NVMe command decoder)

SSD SoC Interconnection

Flash Translation Layer (FTL)

NVMe Host Driver

User Applications /Operating Systems

Task Controller

SSD Embedded Processors

Page 13: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

Summarizer – Initialization (Function Offloading)

13

Host Memory

SQ CQ

Host CPU

Sto

rag

e I

nte

rfa

ce (P

CIe

/ N

VM

e)

SSD Firmware

NAND FlashNAND FlashNAND FlashNAND Flash

Flash Controller

SSD DRAM

DRAM Controller

Summarizer

User Functions

TQ

Re

qu

est

qu

eu

e

Re

spo

nse

qu

eu

e

I/O Controller(NVMe command decoder)

SSD SoC Interconnection

Flash Translation Layer (FTL)

NVMe Host Driver

User Applications /Operating Systems

Task Controller

INIT ( foo)

foo()

foo()f#1Function offloading

Function registration

New NVMe command

Page 14: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

Summarizer –Computation (Dynamic mode)

14

Host Memory

SQ CQ

Host CPU

Sto

rag

e I

nte

rfa

ce (P

CIe

/ N

VM

e)

SSD Firmware

NAND FlashNAND FlashNAND FlashNAND Flash

Flash Controller

SSD DRAM

DRAM Controller

Summarizer

User Functions

TQ

Re

qu

est

qu

eu

e

Re

spo

nse

qu

eu

e

I/O Controller(NVMe command decoder)

SSD SoC Interconnection

Flash Translation Layer (FTL)

NVMe Host Driver

User Applications /Operating Systems

Task Controller

foo()f#1

RD&PROC( LBA,foo)

New NVMe command

New NVMe command decode

RD&PROC(PPA,foo)

goo()f#2

Page 15: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

Summarizer –Computation (Dynamic mode)

15

Host Memory

SQ CQ

Host CPU

Sto

rag

e I

nte

rfa

ce (P

CIe

/ N

VM

e)

SSD Firmware

NAND FlashNAND FlashNAND FlashNAND Flash

Flash Controller

SSD DRAM

DRAM Controller

Summarizer

User Functions

TQ

Re

qu

est

qu

eu

e

Re

spo

nse

qu

eu

e

I/O Controller(NVMe command decoder)

SSD SoC Interconnection

Flash Translation Layer (FTL)

NVMe Host Driver

User Applications /Operating Systems

Task Controller

foo()f#1

RD&PROC(PPA,foo)

RD&P(PPA1,foo)

RD&P(PPA2,foo)

Page data

RD&P(PPA1,foo)

goo()f#2

Page 16: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

Summarizer –Computation (Dynamic mode)

16

Host Memory

SQ CQ

Host CPU

Sto

rag

e I

nte

rfa

ce (P

CIe

/ N

VM

e)

SSD Firmware

NAND FlashNAND FlashNAND FlashNAND Flash

Flash Controller

SSD DRAM

DRAM Controller

Summarizer

User Functions

TQ

Re

qu

est

qu

eu

e

Re

spo

nse

qu

eu

e

I/O Controller(NVMe command decoder)

SSD SoC Interconnection

Flash Translation Layer (FTL)

NVMe Host Driver

User Applications /Operating Systems

Task Controller

foo1()f#1

RD&PROC(PPA,foo)

Page data

RD&P(PPA1,foo)

buf1, foo

CC/Proc

Register in TQ

goo()f#2

Page 17: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

Summarizer –Computation (Dynamic mode)

17

Host Memory

SQ CQ

Host CPU

Sto

rag

e I

nte

rfa

ce (P

CIe

/ N

VM

e)

SSD Firmware

NAND FlashNAND FlashNAND FlashNAND Flash

Flash Controller

SSD DRAM

DRAM Controller

Summarizer

User Functions

TQ

Re

qu

est

qu

eu

e

Re

spo

nse

qu

eu

e

I/O Controller(NVMe command decoder)

SSD SoC Interconnection

Flash Translation Layer (FTL)

NVMe Host Driver

User Applications /Operating Systems

Task Controller

foo()f#1

RD&PROC(PPA,foo)

Page data

RD&P(PPA1,foo)

CC

TQ is full

goo()f#2

Page 18: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

Summarizer – Finalization

18

Host Memory

SQ CQ

Host CPU

Sto

rag

e I

nte

rfa

ce (P

CIe

/ N

VM

e)

SSD Firmware

NAND FlashNAND FlashNAND FlashNAND Flash

Flash Controller

SSD DRAM

DRAM Controller

Summarizer

User Functions

TQ

Re

qu

est

qu

eu

e

Re

spo

nse

qu

eu

e

I/O Controller(NVMe command decoder)

SSD SoC Interconnection

Flash Translation Layer (FTL)

NVMe Host Driver

User Applications /Operating Systems

Task Controller

FINAL ( foo)

New NVMe command

foo()f#1

Results

goo()f#2

Page 19: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

Evaluation Platform

• LS2085a intelligent SSD development platform

• ARM cores running FTL and Summarizerfirmware

• FPGA implementing NAND flash controller

• PCIe Gen. 3 4x lanes for host communication

19

LS2085a

Interconnection

DDR4 Memory Controller

DRAM DRAM

CPU

L1D(32KB)

L2(1MB)

L1I(48KB)

CPU

L1D(32KB)

L1I(48KB)

PC

Ie(h

ost

–L

S2

08

5a

)

PC

Ie(L

S2

08

5a

-F

PG

A)

FPGA(ALTERA Stratix V)

NAND flash DIMMNAND flash DIMMs

CPU

L1D(32KB)

L2(1MB)

L1I(48KB)

CPU

L1D(32KB)

L1I(48KB)

Page 20: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

Evaluation Platform

• LS2085a intelligent SSD development platform

• ARM cores running FTL and Summarizerfirmware

• FPGA implementing NAND flash controller

• PCIe Gen. 3 4x lanes for host communication

20

LS2085a

Interconnection

DDR4 Memory Controller

DRAM DRAM

CPU

L1D(32KB)

L2(1MB)

L1I(48KB)

CPU

L1D(32KB)

L1I(48KB)

PC

Ie(h

ost

–L

S2

08

5a

)

PC

Ie(L

S2

08

5a

-F

PG

A)

FPGA(ALTERA Stratix V)

NAND flash DIMMNAND flash DIMMs

CPU

L1D(32KB)

L2(1MB)

L1I(48KB)

CPU

L1D(32KB)

L1I(48KB)

ARM Processor

NAND flash DIMMs

AlteraStratix V

PCIe (to host)

DRAM

Page 21: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

Evaluation - Performance

21

0

1

2

3

4

0 0.2 0.4 0.6 0.8 1

Static Dynamic

TPC-H Query6

SDD time Host time

Static workload offloading

Page 22: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

Evaluation - Performance

22

0

1

2

3

4

0 0.2 0.4 0.6 0.8 1

Static Dynamic

TPC-H Query6

SDD time Host time

CPU only processing (baseline) SSD only processing

Page 23: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

Evaluation - Performance

23

0

1

2

3

4

0 0.2 0.4 0.6 0.8 1

Static Dynamic

TPC-H Query6

SDD time Host time

Summarizer Dynamic Offloading

Page 24: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

Evaluation - Performance

24

0

1

2

3

4

0 0.2 0.4 0.6 0.8 1

Static Dynamic

TPC-H Query6

SDD time Host time

SSD processing + transfer time(internal + external + In-SSD processing)

Host CPU processing time

Page 25: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

Evaluation - Performance

25

0

1

2

3

4

0 0.2 0.4 0.6 0.8 1

Static Dynamic

TPC-H Query6

SDD time Host timeExecution time normalized to baseline (CPU only)

Page 26: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

Evaluation - Performance

26

0

1

2

3

4

0 0.2 0.4 0.6 0.8 1

Static Dynamic

TPC-H Query6

SDD time Host time

Ex

ecu

tio

n t

ime

(no

rma

lize

d t

o b

ase

lin

e)

Page 27: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

Evaluation - Performance

27

0

1

2

3

4

0 0.2 0.4 0.6 0.8 1

Static Dynamic

TPC-H Query6

SDD time Host time

0.70 0.60

0.30

0.24

0.0

0.2

0.4

0.6

0.8

1.0

1.2

CPU only Dynamic

Chart TitleSDD time Host timeE

xe

cuti

on

tim

e (n

orm

ali

zed

to

ba

seli

ne

)

Page 28: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

Evaluation - Performance

28

0

1

2

3

4

0 0.2 0.4 0.6 0.8 1

Static Dynamic

TPC-H Query6

SDD time Host time

0.70 0.62

0.30

0.24

0.0

0.2

0.4

0.6

0.8

1.0

1.2

CPU only Dynamic

Chart TitleSDD time Host time

Data computation @ host Data transfer from storage

InternalExternal (host – storage)

W/O NDP

With NDPData computation @ storage

Page 29: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

Evaluation - Performance

29

0

1

2

3

4

0 0.2 0.4 0.6 0.8 1

Static Dynamic

TPC-H Query6

SDD time Host time

Performance degraded by static NDP

Page 30: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

Evaluation - Performance

30

16% 10%

20% 7%

Ex

ecu

tio

n t

ime

(no

rma

lize

d t

o b

ase

lin

e)

Ex

ecu

tio

n t

ime

(no

rma

lize

d t

o b

ase

lin

e)

Ex

ecu

tio

n t

ime

(no

rma

lize

d t

o b

ase

lin

e)

Ex

ecu

tio

n t

ime

(no

rma

lize

d t

o b

ase

lin

e)

Page 31: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

Design Exploration –Better SSD Processor

31

Host

CPU Storage interface

Better embedded processor is cost effective

AP

Page 32: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

Design Exploration –Higher Internal Bandwidth

32

0%

20%

40%

60%

80%

100%

120%

X1 X2 X4 X8 X16 X1 X2 X4 X8 X16 X1 X2 X4 X8 X16 X1 X2 X4 X8 X16 X1 X2 X4 X8 X16

TPC-H Query6 TPC-H Query1 TPC-H Query14 String Similarity Join Average

Sp

ee

du

pChart Title

Embedded processor performance

Page 33: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

Design Exploration –Higher Internal Bandwidth

33

0%

20%

40%

60%

80%

100%

120%

X1 X2 X4 X8 X16 X1 X2 X4 X8 X16 X1 X2 X4 X8 X16 X1 X2 X4 X8 X16 X1 X2 X4 X8 X16

TPC-H Query6 TPC-H Query1 TPC-H Query14 String Similarity Join Average

Sp

ee

du

pChart Title

Summarizer is a cost effective NDP solution with powerful storage processors

Page 34: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

Conclusion

34

▪Dynamic computation offloading framework• Opportunistic in-SSD computation

• Page-level task control

• Optimal performance improvement

▪ Summrizer programming model

✓ Dynamic NDP framework for SSDs• Opportunistically enables in-SSD processing• Page-level NDP control• Automatic workload partitioning

✓ Summarizer programming model• Evaluation on the real development platform• Explored design space for future SSDs

Page 35: Dynamic Near Data Processing Framework for SSDsnvmw.ucsd.edu/.../unzip/current/nvmw2018-paper55-presentations-sl… · With NDP Garbage collection Wear-leveling Data computation @

Thank you

(We thank to Dell EMC for supporting the SSD development board)

Summarizer: Trading Communication with Computing Near Storage (MICRO ‘17)