28
© Prof. Dr.-Ing. Wolfgang Lehner | Thomas Kissinger 2014/09/01 Tim Kiefer ADMS 2014 Benjamin Schlegel Hangzhou, China Dirk Habich Daniel Molka Wolfgang Lehner ERIS: A NUMA-A WARE IN-MEMORY STORAGE ENGINE FOR TERA-SCALE ANALYTICAL WORKLOAD

ERIS: A NUMA-AWARE IN-MEMORY S E T -S A W - adms … · Tim Kiefer ADMS 2014 Benjamin Schlegel Hangzhou, ... 1000 local remote 0 10 20 30 40 ... ERIS: A NUMA-Aware In-Memory Storage

Embed Size (px)

Citation preview

© Prof. Dr.-Ing. Wolfgang Lehner |

Thomas Kissinger 2014/09/01Tim Kiefer ADMS 2014Benjamin Schlegel Hangzhou, ChinaDirk HabichDaniel MolkaWolfgang Lehner

ERIS: A NUMA-AWARE IN-MEMORYSTORAGE ENGINE FOR TERA-SCALE

ANALYTICAL WORKLOAD

| 2

Motivation

ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload

Databases in the many-core era

0

0.5

1

1.5

2

0

0.5

1

1.5

2

2.5

3

3.5

0 64 128 192 256 320 384 448

Scan

Th

rou

ghp

ut

[Ti

B/s

]

Loo

kup

Th

rou

ghp

ut

[Bill

ion

/s]

#Cores

Shared LookupERIS LookupShared ScanERIS Scan

| 3

NUMA Systems

ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload

85

19616.4

1.8

0

50

100

150

200

250

local remote0

5

10

15

20

latency (ns) bandwidth (GB/s)

AMD8 nodes64 cores64 GBs max 2 hops

| 4

NUMA Systems

ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload

AMD8 nodes64 cores64 GBs max 2 hops

SGI64 nodes512 cores

8 TBsmax 4 hops81

87036.2

6.5

0

200

400

600

800

1000

local remote0

10

20

30

40

latency (ns) bandwidth (GB/s)

85

19616.4

1.8

0

50

100

150

200

250

local remote0

5

10

15

20

latency (ns) bandwidth (GB/s)

| 5

In a nutshell

ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload

Key-value store with high lookup, scan, and insert performance to support analytical workloads

In-memory storage engine for multi-socket-multi-core systems with large main memories (NUMA)

| 6

In a nutshell

ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload

Predominantly a shared-nothing distributed system (partition per core)

Implemented as partitioned prefix-tree and partitioned column store

In-memory storage engine for multi-socket-multi-core systems with large main memories (NUMA)

Key-value store with high lookup, scan, and insert performance to support analytical workloads

| 7

In a nutshell

ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload

Predominantly a shared-nothing distributed system (partition per core)

Implemented as partitioned prefix-tree and partitioned column store

Experiments show linear scalability, superior memory and link usage, efficient load balancing

In-memory storage engine for multi-socket-multi-core systems with large main memories (NUMA)

Key-value store with high lookup, scan, and insert performance to support analytical workloads

| 8

>

ERIS Implementation

ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload

| 9

ERIS Data Structures

Data structures: prefix tree (lookup) and column store (scan)

ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload

0 1 2 14 15…

0 1 2 14 15… 0 1 2 14 15…

0 1 2 14 15…

… … …

4bit

4bit

4bit

KeyValue

KeyValue

KeyValue

KeyValue

ValueValueValueValueValueValueValueValueValueValueValueValue

Prefix Tree Column Store

| 10

Partitioned Prefix Tree

Prefix Tree Layout

ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload

Direct Access Indirect Access

| 11

ERIS Architecture

ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload

Multiprocessor 1

AEU

Core 1

Local Memory

Core N

Local Memory Manager

AEU

| 12

ERIS Architecture

ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload

Multiprocessor 1

AEU

Core 1

Local Memory

Core N

Local Memory Manager

Multiprocessor M

Core 1

Local Memory

Core N

Local Memory Manager

NUMA-Optimized High-Throughput Data Command RoutingGlobal Partition Table (GPT)

…AEU AEU AEU

| 13

ERIS Architecture

ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload

PartitionTransfer

Multiprocessor 1

AEU

Core 1

Local Memory

Core N

Local Memory Manager

Multiprocessor M

Core 1

Local Memory

Core N

Local Memory Manager

NUMA-Optimized High-Throughput Data Command RoutingGlobal Partition Table (GPT)

Monitoring

…LoadBalancer AEU AEU AEU

| 14

ERIS Architecture

ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload

Core N

Autonomous Execution Unit (AEU) Local Memory

Local Command Buffer

AEUAEU’s  Partitions

Process Data Commands(i.e.., Scan, Lookup, and

Insert/Upsert)

Process Balancing Commands

Group Data Commands

Column-StoreIndex

PartitionTransfer

Multiprocessor 1

AEU

Core 1

Local Memory

Core N

Local Memory Manager

Multiprocessor M

Core 1

Local Memory

Core N

Local Memory Manager

NUMA-Optimized High-Throughput Data Command RoutingGlobal Partition Table (GPT)

Monitoring

…LoadBalancer AEU AEU AEU

| 15

Load Balancing

Load Balancer Implementation

ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload

Multiprocessor 1

Local Memory

Multiprocessor 2

Local Memory

Intra-Node Transfer

link

AEU AEU AEUAEU

| 16

Load Balancing

Load Balancer Implementation

ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload

Multiprocessor 1

Local Memory

Multiprocessor 2

Local Memory

Intra-Node Transfer

link

Inter-Node Transfer

copy

AEU AEU AEUAEU

Transfer Command

Raw Data Stream

| 17

Load Balancing

Load Balancer Strategies

ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload

| 18

Load Balancing

Load Balancer Strategies

ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload

| 19

>

ERIS Evaluation

ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload

| 20

Evaluation

Lookup/Upsert Throughput Depending on Index Size

ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload

AMD Machine SGI Machine

Loo

kup

Up

sert

| 21

Evaluation

Scan Performance

� SGI Machine� 488 cores – parallel scan� 8 billion entries in the column store

ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload

2094.1

273.6

33.8

0 500 1000 1500 2000 2500

ERIS

Interleaved

Single RAM

Bandwidth [GB/s]

| 22

Evaluation

Link and Memory Controller Activity

� AMD Machine� Scan: 8B Keys� Lookup: 1B Keys

ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload

33.8

41.6

75.6

83.8

122.9

73.04

1.2

17.8

0 20 40 60 80 100 120 140

Scan

Lookup

Scan

Lookup

MEM

LIN

K

Bandwidth [GB/s]

ERIS Shared

© Prof. Dr.-Ing. Wolfgang Lehner |

Thomas Kissinger 2014/09/01Tim Kiefer ADMS 2014Benjamin Schlegel Hangzhou, ChinaDirk HabichDaniel MolkaWolfgang Lehner

ERIS: A NUMA-AWARE IN-MEMORYSTORAGE ENGINE FOR TERA-SCALE

ANALYTICAL WORKLOAD

| 24

Data Command Routing

ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload

AEU

1

Local Outgoing Buffers

Local Incoming Buffer

… AEU N

1. Batch Lookup Target AEU(s)

3. Copy to Target

Local Incoming BufferActive

1bitOffset32bit

Active Writers31bit

Bitmap Partition Table

Range Partition Table

Range Partition Table

Multicast Buffer

To AEU 1

Unicast Buffer

Multicast References

To AEU N

Unicast Buffer

Multicast References2.

Local Outgoing Buffers

Local Incoming BufferProcessing

Fill

2.

2.

| 25

Data Command Routing – Evaluation

ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload

0

200

400

600

800

1000

1200

0 64 128 192 256 320 384 448 512

Thro

ugh

pu

t [M

illio

n R

ou

tin

gs/s

]

Local Buffer Size [#Requests]

Raw Routing

Routing w/ Index Lookups

| 26

Evaluation

L3 Cache Usage – Index Lookup

ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload

0

10

20

30

40

50

60

70

80

0

50

100

150

200

250

300

350

400

16M 32M 64M 128M 256M 512M 1B 2B

L3 C

ach

e M

iss

Rat

io [

%]

Thro

ugh

pu

t [M

illio

n/s

]

#Keys

ERIS Shared ERIS L3 Cache Shared L3 Cache

| 27

Evaluation

L3 Cache Line State – Index Lookup

� Percentage of all hits� 1B keys

ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload

19.4 %

76.6%

2.1% 1.9%

16.3%4.5%

20.9%

58.4%

0102030405060708090

100

Modified Exclusive Forward Shared

Pe

rce

nt

Cache Line State

ERIS Shared Index

| 28

Evaluation

Load Balancer Experiments

ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload