41
Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16

Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

Maciej Maciejewski, Krzysztof Czuryło

LinuxCon, Berlin ’16

Page 2: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

Salvador DalíThe Persistence of Memory

2

Page 3: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

3

Page 4: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

Establishing the Open Industry NVM Programming Model

36+ Member Companies

http://snia.org/sites/default/files/NVMProgrammingModel_v1.pdf

SNIA Technical Working GroupDefined 4 programming modes required by developers

Spec 1.0 developed, approved by SNIA voting members and published

Interfaces for PM-aware file system accessing

kernel PM support

interfaces for application accessing a PM-aware file

system

Kernel support for block NVM

extensions

Interfaces for legacy applications to access block NVM extensions

4

Page 5: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

Persistent memory programming model

5

NVDIMM Hardware

UserSpace

KernelSpace

Standard

File API

NVDIMM Driver

Application

File System

ApplicationApplication

Standard

Raw Device

Access

Load/Store

Management Library

Management UI

Standard

File API

pmem-AwareFile System

MMU

Mappings

Page 6: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

OPERATING SYSTEM

6

Page 7: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

ACPI NFIT

7

Page 8: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

E820

8

Page 9: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

What can we do with it?

9

Page 10: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

Memory

10

Page 11: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

Memory - libvmmalloc

11

libvmem

jemalloc

Application

PM

KernelSpace

UserSpace

malloc

vmmalloc

mmap

DRAM

vmem_pool_mallocvmem_pool_create

constructor

temporary file

Page 12: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

Memory - libvmem

12

libvmem

jemalloc (3.6.0)

Application

Persistent Memory

KernelSpace

UserSpace

libc(or other)

vmem_malloc

vmem

fallocate/mmap

malloc

DRAM

mmap/sbrk

vmem_create

vmem_pool_mallocmalloc

vmem_pool_create

temporary file

Page 13: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

Memory - https://github.com/memkind/memkind

13

Page 14: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

Block storage

14

Page 15: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

15

Page 16: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

Byte persistency

16

Page 17: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

Data persistency

17

Page 18: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

Atomicity

18

Flushing to Persistence

open(…);

mmap(…);

strcpy(pmem, „Hello");

msync(pmem, 6, MS_SYNC);

pmem_persist(pmem, 6);

strcpy(pmem, „Hello, World!");

pmem_persist(pmem, 14);

Crossing the 8-Byte Store

Result?

1. „\0\0\0\0\0\0\0\0\0\0...”

2. „Hello, W\0\0\0\0\0\0...”

3. „\0\0\0\0\0\0\0\0orld!\0”

4. „Hello, \0\0\0\0\0\0\0\0”

5. „Hello, World!\0”crash

Page 19: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

Location

files/pools

mmap(2)

allocation mechanism

bookkeeping

replication & recovery

19

Page 20: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

20

PersistentMemory

UserSpace

KernelSpace

Application

Load/Store

MMUMappings

NVDIMM Driver

file

StandardFile API

PM-awareFile System

NVML

Page 21: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

http://pmem.io

https://github.com/pmem/nvml/

NVML

21

Page 22: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

nvml Persistent Libraries

libpmem – Basic persistency handling

libpmemblk – Block access to persistent memory

libpmemlog - Log file on persistent memory (append-mostly)

libpmemobj - Transactional Object Store on persistent memory

libpmempool – Pool management utilities

librpmem - Replication

22

https://github.com/pmem/nvml/tree/master/src/examples

http://pmem.io/blog/

Page 23: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

Applications

23

Page 24: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

Modifying application allocations

Which objects to store at PM?

How to distinguish whether it’s better to allocate/store at HBM/DRAM/PM/SSD/HDD

Do all need to be persistent?

When to guarantee persistence?

24

Page 25: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

Modifying application engine

for (int i=0; i<NUMBER_OF_ITERATIONS; i++) {result = calculateThis(i, result);

}

25

i_pm = 0; //at first runtime of app only

...

...

for (i_pm; i_pm<NUMBER_OF_ITERATIONS; i_pm++) {TX_BEGIN(pool) {

result_pm = calculateThis(i_pm, result_pm);} TX_END

} i_pm = 0;

Page 26: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

Redis key/value store

26

*All following results come from some old developer machine with Persistent Memory emulated on DDR3

Page 27: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

27

0

5

10

15

20

25

30

32 64 128 256 512 1024 2048 4096 8192

seco

nd

s

object size

Startup time

RDB

AOF

PM

Page 28: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

28

0

1000

2000

3000

4000

5000

6000

7000

8000

32 64 128 256 512 1024 2048 4096 8192

DR

AM

all

oca

tio

ns

[MB

]

objects size

DRAM usage

AOF

RDB

PM

Page 29: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

29

0

20000

40000

60000

80000

100000

120000

36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141

op

era

tio

ns

/ se

c

% of OS memory

Running out of DRAM

No Persist

Page 30: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

30

0

20000

40000

60000

80000

100000

120000

36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141

op

era

tio

ns

/ se

c

% of OS memory

Running out of DRAM

No Persist

RDB

Page 31: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

31

0

20000

40000

60000

80000

100000

120000

36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141

op

era

tio

ns

/ se

c

% of OS memory

Running out of DRAM

No Persist

RDB

AOF

Page 32: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

32

0

20000

40000

60000

80000

100000

120000

36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141

op

era

tio

ns

/ se

c

% of OS memory

Running out of DRAM

No Persist

RDB

PM

AOF

Page 33: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

33

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

100000

32 64 128 256 512 1024 2048 4096 8192

Redis, Transactional API

pmem

AOF

Page 34: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

34

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

100000

32 64 128 256 512 1024 2048 4096 8192

Redis, Transactional API

pmem

AOF

PM 4x

AOF 4x

Page 35: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

35

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

100000

32 64 128 256 512 1024 2048 4096 8192

Redis, Transactional API

pmem

AOF

PM 10x

AOF 10x

PM 4x

AOF 4x

Page 36: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

36

0

0,2

0,4

0,6

0,8

1

1,2

1,4

1,6

1,8

2

0 1000 2000 3000 4000 5000 6000 7000 8000 9000

By

tes/

se

con

d

Bil

lio

ns

Object size [B]

Total Data throughput

pmem

AOF

pmem 10x

AOF 10x

Page 37: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

PETSc

Portable, Extensible Toolkit for Scientific Computation

A suite of data structures and routines developed by Argonne National Laboratory for the scalable (parallel) solution of scientific applications modeled by partial differential equations.

It employs the Message Passing Interface (MPI) standard for all message-passing communication.

PETSc is the world’s most widely used parallel numerical software library for partial differential equations and sparse matrix computations with over 760 publications.

PETSc includes a large suite of parallel linear and nonlinear equation solvers that are easily used in application codes written in C, C++, Fortran and now Python.

PETSc provides mechanisms needed within parallel application code, that allow the overlap of communication and computation.

https://www.mcs.anl.gov/petsc/

37

Page 38: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

Sparse Matrix multiplication

Original:

Time (sec): 2.070e+03

Memory: 1.361e+10

Count Time (sec)

MatAssemblyBegin 4 1.4782e-05

MatAssemblyEnd 4 2.8858e+00

MatLoad 2 1.6146e+01

MatMatMultSym 1 9.0468e+02

MatMatMultNum 1 1.1468e+03

Persistent Memory:

Time (sec): 2.109e+03

Memory: 1.048e+06

Count Time (sec)

MatAssemblyBegin 2 7.1526e-06

MatAssemblyEnd 2 2.6408e+00

MatLoad 2 3.6855e-03

MatMatMultSym 1 9.1570e+02

MatMatMultNum 1 1.1929e+03

Compute 2% slower

Preparation 680% faster

38

Page 39: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

Sparse matrix solverOriginal Matrixes in PM All in PM

MatAssemblyBegin 2.6226e-06 1.6443e-02 1.6618e-02

MatAssemblyEnd 1.1275e-01

MatLoad 1.7201e+00

MatMult 9.3791e-02 1.1111e-01 1.1177e-01

VecSet 5.0273e-03 4.6258e-03 9.6970e-03

MatMult 2.6623e+01 2.8851e+01 2.9268e+01

MatView 9.5606e-05 9.3222e-05 1.0443e-04

VecMDot 7.0743e+00 7.0108e+00 8.3750e+00

VecNorm 6.4105e-01 6.4236e-01 6.4869e-01

VecScale 7.0098e-01 7.0180e-01 7.0580e-01

VecCopy 1.5669e-02 1.5939e-02 1.5724e-02

VecSet 4.0993e-02 4.0953e-02 8.0090e-02

VecAXPY 6.5127e-02 6.6045e-02 6.8618e-02

VecMAXPY 1.0328e+01 1.0488e+01 8.7134e+00

VecPointwiseMult 1.1661e+00 1.2202e+00 1.3907e+00

VecNormalize 1.3422e+00 1.3443e+00 1.3546e+00

KSPGMRESOrthog 1.6762e+01 1.6844e+01 1.6556e+01

KSPSetUp 4.5981e-03 4.6084e-03 2.6381e+00

KSPSolve 4.6762e+01 4.9142e+01 6.7208e+01

PCSetUp 2.6226e-06 2.3842e-06 2.8610e-06

PCApply 1.2530e+00 1.3039e+00 1.4794e+00

Time [sec]

Stage 1: File loading

Stage 2: Vector duplication and multiplication

Stage 3: Solver stage

0

100

200

300

400

500

600

Time [s] Memory [MB]

48,67

557

49,33

260

74,89

7

GMRES Sparse Matrix solver

Original Matrix data in PM All in PM

39

Page 40: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management

No universal receipt

Different data usage scenarios

A lot of architectural work

Even more coding

40Credit: Uwe Kils http://www.ecoscope.com/iceberg/

SNIAProgramming models

NVMLHW

RDMA

RAS

Languages bindingsreplication

OS addressing space limit

OS boot up / hibernation

POSIX

Wear leveling

JVM memory management

Virtualization

Space management

TLB

Page 41: Maciej Maciejewski, Krzysztof Czuryło LinuxCon, Berlin ’16...Kernel Space Standard File API NVDIMM Driver Application File System Application Raw Device Access Load/Store Management