View
2
Download
0
Category
Preview:
Citation preview
Maciej Maciejewski, Krzysztof Czuryło
LinuxCon, Berlin ’16
Salvador DalíThe Persistence of Memory
2
3
Establishing the Open Industry NVM Programming Model
36+ Member Companies
http://snia.org/sites/default/files/NVMProgrammingModel_v1.pdf
SNIA Technical Working GroupDefined 4 programming modes required by developers
Spec 1.0 developed, approved by SNIA voting members and published
Interfaces for PM-aware file system accessing
kernel PM support
interfaces for application accessing a PM-aware file
system
Kernel support for block NVM
extensions
Interfaces for legacy applications to access block NVM extensions
4
Persistent memory programming model
5
NVDIMM Hardware
UserSpace
KernelSpace
Standard
File API
NVDIMM Driver
Application
File System
ApplicationApplication
Standard
Raw Device
Access
Load/Store
Management Library
Management UI
Standard
File API
pmem-AwareFile System
MMU
Mappings
OPERATING SYSTEM
6
ACPI NFIT
7
E820
8
What can we do with it?
9
Memory
10
Memory - libvmmalloc
11
libvmem
jemalloc
Application
PM
KernelSpace
UserSpace
malloc
vmmalloc
mmap
DRAM
vmem_pool_mallocvmem_pool_create
constructor
temporary file
Memory - libvmem
12
libvmem
jemalloc (3.6.0)
Application
Persistent Memory
KernelSpace
UserSpace
libc(or other)
vmem_malloc
vmem
fallocate/mmap
malloc
DRAM
mmap/sbrk
vmem_create
vmem_pool_mallocmalloc
vmem_pool_create
temporary file
Memory - https://github.com/memkind/memkind
13
Block storage
14
15
Byte persistency
16
Data persistency
17
Atomicity
18
Flushing to Persistence
open(…);
mmap(…);
strcpy(pmem, „Hello");
msync(pmem, 6, MS_SYNC);
pmem_persist(pmem, 6);
strcpy(pmem, „Hello, World!");
pmem_persist(pmem, 14);
Crossing the 8-Byte Store
Result?
1. „\0\0\0\0\0\0\0\0\0\0...”
2. „Hello, W\0\0\0\0\0\0...”
3. „\0\0\0\0\0\0\0\0orld!\0”
4. „Hello, \0\0\0\0\0\0\0\0”
5. „Hello, World!\0”crash
Location
files/pools
mmap(2)
allocation mechanism
bookkeeping
replication & recovery
19
20
PersistentMemory
UserSpace
KernelSpace
Application
Load/Store
MMUMappings
NVDIMM Driver
file
StandardFile API
PM-awareFile System
NVML
nvml Persistent Libraries
libpmem – Basic persistency handling
libpmemblk – Block access to persistent memory
libpmemlog - Log file on persistent memory (append-mostly)
libpmemobj - Transactional Object Store on persistent memory
libpmempool – Pool management utilities
librpmem - Replication
22
https://github.com/pmem/nvml/tree/master/src/examples
http://pmem.io/blog/
Applications
23
Modifying application allocations
Which objects to store at PM?
How to distinguish whether it’s better to allocate/store at HBM/DRAM/PM/SSD/HDD
Do all need to be persistent?
When to guarantee persistence?
24
Modifying application engine
for (int i=0; i<NUMBER_OF_ITERATIONS; i++) {result = calculateThis(i, result);
}
25
i_pm = 0; //at first runtime of app only
...
...
for (i_pm; i_pm<NUMBER_OF_ITERATIONS; i_pm++) {TX_BEGIN(pool) {
result_pm = calculateThis(i_pm, result_pm);} TX_END
} i_pm = 0;
Redis key/value store
26
*All following results come from some old developer machine with Persistent Memory emulated on DDR3
27
0
5
10
15
20
25
30
32 64 128 256 512 1024 2048 4096 8192
seco
nd
s
object size
Startup time
RDB
AOF
PM
28
0
1000
2000
3000
4000
5000
6000
7000
8000
32 64 128 256 512 1024 2048 4096 8192
DR
AM
all
oca
tio
ns
[MB
]
objects size
DRAM usage
AOF
RDB
PM
29
0
20000
40000
60000
80000
100000
120000
36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141
op
era
tio
ns
/ se
c
% of OS memory
Running out of DRAM
No Persist
30
0
20000
40000
60000
80000
100000
120000
36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141
op
era
tio
ns
/ se
c
% of OS memory
Running out of DRAM
No Persist
RDB
31
0
20000
40000
60000
80000
100000
120000
36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141
op
era
tio
ns
/ se
c
% of OS memory
Running out of DRAM
No Persist
RDB
AOF
32
0
20000
40000
60000
80000
100000
120000
36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141
op
era
tio
ns
/ se
c
% of OS memory
Running out of DRAM
No Persist
RDB
PM
AOF
33
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
32 64 128 256 512 1024 2048 4096 8192
Redis, Transactional API
pmem
AOF
34
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
32 64 128 256 512 1024 2048 4096 8192
Redis, Transactional API
pmem
AOF
PM 4x
AOF 4x
35
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
32 64 128 256 512 1024 2048 4096 8192
Redis, Transactional API
pmem
AOF
PM 10x
AOF 10x
PM 4x
AOF 4x
36
0
0,2
0,4
0,6
0,8
1
1,2
1,4
1,6
1,8
2
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
By
tes/
se
con
d
Bil
lio
ns
Object size [B]
Total Data throughput
pmem
AOF
pmem 10x
AOF 10x
PETSc
Portable, Extensible Toolkit for Scientific Computation
A suite of data structures and routines developed by Argonne National Laboratory for the scalable (parallel) solution of scientific applications modeled by partial differential equations.
It employs the Message Passing Interface (MPI) standard for all message-passing communication.
PETSc is the world’s most widely used parallel numerical software library for partial differential equations and sparse matrix computations with over 760 publications.
PETSc includes a large suite of parallel linear and nonlinear equation solvers that are easily used in application codes written in C, C++, Fortran and now Python.
PETSc provides mechanisms needed within parallel application code, that allow the overlap of communication and computation.
https://www.mcs.anl.gov/petsc/
37
Sparse Matrix multiplication
Original:
Time (sec): 2.070e+03
Memory: 1.361e+10
Count Time (sec)
MatAssemblyBegin 4 1.4782e-05
MatAssemblyEnd 4 2.8858e+00
MatLoad 2 1.6146e+01
MatMatMultSym 1 9.0468e+02
MatMatMultNum 1 1.1468e+03
Persistent Memory:
Time (sec): 2.109e+03
Memory: 1.048e+06
Count Time (sec)
MatAssemblyBegin 2 7.1526e-06
MatAssemblyEnd 2 2.6408e+00
MatLoad 2 3.6855e-03
MatMatMultSym 1 9.1570e+02
MatMatMultNum 1 1.1929e+03
Compute 2% slower
Preparation 680% faster
38
Sparse matrix solverOriginal Matrixes in PM All in PM
MatAssemblyBegin 2.6226e-06 1.6443e-02 1.6618e-02
MatAssemblyEnd 1.1275e-01
MatLoad 1.7201e+00
MatMult 9.3791e-02 1.1111e-01 1.1177e-01
VecSet 5.0273e-03 4.6258e-03 9.6970e-03
MatMult 2.6623e+01 2.8851e+01 2.9268e+01
MatView 9.5606e-05 9.3222e-05 1.0443e-04
VecMDot 7.0743e+00 7.0108e+00 8.3750e+00
VecNorm 6.4105e-01 6.4236e-01 6.4869e-01
VecScale 7.0098e-01 7.0180e-01 7.0580e-01
VecCopy 1.5669e-02 1.5939e-02 1.5724e-02
VecSet 4.0993e-02 4.0953e-02 8.0090e-02
VecAXPY 6.5127e-02 6.6045e-02 6.8618e-02
VecMAXPY 1.0328e+01 1.0488e+01 8.7134e+00
VecPointwiseMult 1.1661e+00 1.2202e+00 1.3907e+00
VecNormalize 1.3422e+00 1.3443e+00 1.3546e+00
KSPGMRESOrthog 1.6762e+01 1.6844e+01 1.6556e+01
KSPSetUp 4.5981e-03 4.6084e-03 2.6381e+00
KSPSolve 4.6762e+01 4.9142e+01 6.7208e+01
PCSetUp 2.6226e-06 2.3842e-06 2.8610e-06
PCApply 1.2530e+00 1.3039e+00 1.4794e+00
Time [sec]
Stage 1: File loading
Stage 2: Vector duplication and multiplication
Stage 3: Solver stage
0
100
200
300
400
500
600
Time [s] Memory [MB]
48,67
557
49,33
260
74,89
7
GMRES Sparse Matrix solver
Original Matrix data in PM All in PM
39
No universal receipt
Different data usage scenarios
A lot of architectural work
Even more coding
40Credit: Uwe Kils http://www.ecoscope.com/iceberg/
SNIAProgramming models
NVMLHW
RDMA
RAS
Languages bindingsreplication
OS addressing space limit
OS boot up / hibernation
POSIX
Wear leveling
JVM memory management
Virtualization
Space management
TLB
Recommended