120
High-Throughput Subset Matching on Commodity GPU-Based Systems Daniele Rogora * Michele Papalini $ Koorosh Khazaei * Alessandro Margara % Antonio Carzaniga * Gianpaolo Cugola % presented by Daniele Rogora % Politecnico di Milano * Università della Svizzera italiana $ Cisco Systems Milano Lugano Paris Italy Switzerland France EuroSys 2017 1 / 30

High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

High-Throughput Subset Matching on

Commodity GPU-Based Systems

Daniele Rogora∗ Michele Papalini$ Koorosh Khazaei∗

Alessandro Margara% Antonio Carzaniga∗ Gianpaolo Cugola%

presented by

Daniele Rogora

%Politecnico di Milano ∗Università della Svizzera italiana $Cisco Systems

Milano Lugano Paris

Italy Switzerland France

EuroSys 2017

1 / 30

Page 2: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Subset Match

Useful in many scenarios

Social networks, Twitter

2 / 30

Page 3: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Subset Match

Useful in many scenarios

Social networks, Twitter

Data Center management

2 / 30

Page 4: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Subset Match

Useful in many scenarios

Social networks, Twitter

Data Center management

Service brokering

2 / 30

Page 5: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Subset Match

Useful in many scenarios

Social networks, Twitter

Data Center management

Service brokering

Cloud 3.0

2 / 30

Page 6: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Example

Subscribers Tag Set...

.

.

.

Daniele

{#football, #acmilan}

{#politics, #Italy}

Antonio {#politics, #USA}

{#chomsky}...

.

.

.

3 / 30

Page 7: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Example

Subscribers Tag Set...

.

.

.

Daniele

{#football, #acmilan}

{#politics, #Italy}

Antonio {#politics, #USA}

{#chomsky}...

.

.

.

#politics, #USA

#Italy#politics, #USA,

#Italy#politics, #USA,

#Italy

3 / 30

Page 8: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Example

Subscribers Tag Set...

.

.

.

Daniele

{#football, #acmilan}

{#politics, #Italy}

Antonio {#politics, #USA}

{#chomsky}...

.

.

.

#politics, #USA

#Italy#politics, #USA,

#Italy#acmilan, #closing,

#news, #football

3 / 30

Page 9: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Tagsets Representation

Representation of tagsets with Bloom filters

4 / 30

Page 10: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Tagsets Representation

Representation of tagsets with Bloom filters

a bitvector of size m

k independent hash functions h1, . . . ,hk

hi : Tags →{1, . . . ,m}

4 / 30

Page 11: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Tagsets Representation

Representation of tagsets with Bloom filters

a bitvector of size m

k independent hash functions h1, . . . ,hk

hi : Tags →{1, . . . ,m}

Example: (k = 2,m = 10)

1 2 3 4 5 6 7 8 9 10

h1

h2

D = {politics, Italy, USA}

4 / 30

Page 12: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Tagsets Representation

Representation of tagsets with Bloom filters

a bitvector of size m

k independent hash functions h1, . . . ,hk

hi : Tags →{1, . . . ,m}

Example: (k = 2,m = 10)

1 2 3 4 5 6 7 8 9 10

h1

h2

D = {politics, Italy, USA} 1 1

4 / 30

Page 13: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Tagsets Representation

Representation of tagsets with Bloom filters

a bitvector of size m

k independent hash functions h1, . . . ,hk

hi : Tags →{1, . . . ,m}

Example: (k = 2,m = 10)

1 2 3 4 5 6 7 8 9 10

h1

h2

D = {politics, Italy, USA} 1 11

4 / 30

Page 14: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Tagsets Representation

Representation of tagsets with Bloom filters

a bitvector of size m

k independent hash functions h1, . . . ,hk

hi : Tags →{1, . . . ,m}

Example: (k = 2,m = 10)

1 2 3 4 5 6 7 8 9 10

h1

h2

D = {politics, Italy, USA} 1 111 1

4 / 30

Page 15: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Tagsets Representation

Representation of tagsets with Bloom filters

a bitvector of size m

k independent hash functions h1, . . . ,hk

hi : Tags →{1, . . . ,m}

Example: (k = 2,m = 10)

1 2 3 4 5 6 7 8 9 10

D = {politics, Italy, USA} 1 111 1

4 / 30

Page 16: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Tagsets Representation

1 2 3 4 5 6 7 8 9 10

1 111 100 0 0 0

4 / 30

Page 17: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Example

Subscribers Bit String...

.

.

.

Daniele

{#football, #acmilan}

{#politics, #Italy}

Antonio {#politics, #USA}

{#chomsky}...

.

.

.

#politics, #USA

#Italy#politics, #USA,

#Italy#politics, #USA,

#Italy

5 / 30

Page 18: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Example

Subscribers Bit String...

.

.

.

k1

aaa1001101000aaa

0010010011

k2 1001000011

0000101000...

.

.

.

101101001110110100111011010011

5 / 30

Page 19: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Model

Tagset

table

Bit String Keys

1000100000 k2

1010000100 k4,k2

0110100000 k3

0011100010 k6,k2

0010101000 k5,k2

0000100100 k2

6 / 30

Page 20: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Model

Tagset

table

Bit String Keys

1000100000 k2

1010000100 k4,k2

0110100000 k3

0011100010 k6,k2

0010101000 k5,k2

0000100100 k2

Query stream

0110101100

6 / 30

Page 21: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Model

Tagset

table

Bit String Keys

1000100000 k2

1010000100 k4,k2

0110100000 k3

0011100010 k6,k2

0010101000 k5,k2

0000100100 k2

Query stream

0110101100

6 / 30

Page 22: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Model

Tagset

table

Bit String Keys

1000100000 k2

1010000100 k4,k2

0110100000 k3

0011100010 k6,k2

0010101000 k5,k2

0000100100 k2

Query stream

0110101100

Output

k2,k3,k5,k2match

6 / 30

Page 23: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Model

Tagset

table

Bit String Keys

1000100000 k2

1010000100 k4,k2

0110100000 k3

0011100010 k6,k2

0010101000 k5,k2

0000100100 k2

Query stream

0110101100

Output

k2,k3,k5match-unique

6 / 30

Page 24: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Model

Tagset

table

Bit String Keys

1000100000 k2

1010000100 k4,k2

0110100000 k3

0011100010 k6,k2

0010101000 k5,k2

0000100100 k2

Query stream

0110101100

Output

k2,k3,k5match-unique

The stream of filters is

intense: 6k queries/s

The database is huge:

212M tag sets

6 / 30

Page 25: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

A Complex Problem

database size

system 20M 40M 212M

MongoDB — — —

GPU-only, plain 0.40 0.20 0.04

GPU-only, plain with batching 11.50 6.30 1.20

CPU-only, fast prefix tree 21.10 14.00 4.30

CPU-only, state-of-the-art ICN 27.60 17.40 —

CPU-only, Tagmatch 3.90 3.40 0.68

Tagmatch 268.80 144.40 35.30

(throughput: thousand queries per second)

7 / 30

Page 26: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

A Complex Problem

database size

system 20M 40M 212M

MongoDB — — —

GPU-only, plain 0.40 0.20 0.04

GPU-only, plain with batching 11.50 6.30 1.20

CPU-only, fast prefix tree 21.10 14.00 4.30

CPU-only, state-of-the-art ICN 27.60 17.40 —

CPU-only, Tagmatch 3.90 3.40 0.68

Tagmatch 268.80 144.40 35.30

(throughput: thousand queries per second)

Rivest, 1976

7 / 30

Page 27: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

A Complex Problem

database size

system 20M 40M 212M

MongoDB — — —

GPU-only, plain 0.40 0.20 0.04

GPU-only, plain with batching 11.50 6.30 1.20

CPU-only, fast prefix tree 21.10 14.00 4.30

CPU-only, state-of-the-art ICN 27.60 17.40 —

CPU-only, Tagmatch 3.90 3.40 0.68

Tagmatch 268.80 144.40 35.30

(throughput: thousand queries per second)

7 / 30

Page 28: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

TagMatch

8 / 30

Page 29: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

First Approach: using GPUs

Kernel

9 / 30

Page 30: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

First Approach: using GPUs

Kernel

Block 0 Block 1 Block 2

Block 3 Block 4 Block 5

Block 6 Block . . . Block n

9 / 30

Page 31: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

First Approach: using GPUs

Kernel

Block 0 Block 1 Block 2

Block 3 Block 4 Block 5

Block 6 Block . . . Block n

9 / 30

Page 32: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

First Approach: using GPUs

Kernel

Block 0 Block 1 Block 2

Block 3 Block 4 Block 5

Block 6 Block . . . Block n

tagset

table

s0

s1

s2

.

.

.

.

.

.

sn−2

sn−1

sn

q

9 / 30

Page 33: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

First Approach: using GPUs

Kernel

Block 0 Block 1 Block 2

Block 3 Block 4 Block 5

Block 6 Block . . . Block n

tagset

table

s0

s1

s2

.

.

.

.

.

.

sn−2

sn−1

sn

q

thread i

if (si ⊆ q)

results.add(q)

9 / 30

Page 34: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

First Approach: using GPUs

Kernel

Block 0 Block 1 Block 2

Block 3 Block 4 Block 5

Block 6 Block . . . Block n

tagset

table

s0

s1

s2

.

.

.

.

.

.

sn−2

sn−1

sn

q0 q1 q2 q3 q4 . . . q255

thread i

for (q ∈ q0 . . . q255)

if (si ⊆ q)

results.add(q)

9 / 30

Page 35: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

First Approach: using GPUs

CPU: launch kernel

CPU: merge matches with keys

results

key

table

q0 q1 q2 q3 q4 . . . q255Kernel

Block 0 Block 1 Block 2

Block 3 Block 4 Block 5

Block 6 Block . . . Block n

tagset

table

s0

s1

s2

.

.

.

.

.

.

sn−2

sn−1

sn

9 / 30

Page 36: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

First Approach: using GPUs

CPU: launch kernel

CPU: merge matches with keys

results

key

table

q0 q1 q2 q3 q4 . . . q255Kernel

Block 0 Block 1 Block 2

Block 3 Block 4 Block 5

Block 6 Block . . . Block n

tagset

table

s0

s1

s2

.

.

.

.

.

.

sn−2

sn−1

sn

This is not fast enough

database size

system 20M 40M 212M

MongoDB — — -–

GPU-only, plain 0.40 0.20 0.04

GPU-only, plain with batching 11.50 6.30 1.20

CPU-only, fast prefix tree 21.10 14.00 4.30

CPU-only, state-of-the-art ICN 27.60 17.40 —

CPU-only, Tagmatch 3.90 3.40 0.68

Tagmatch 268.80 144.40 35.30

(throughput: thousand queries per second)

9 / 30

Page 37: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Partitioning

lots of filters share many bits...

we could filter out many filters efficiently and quickly...

10 / 30

Page 38: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Partitioning

lots of filters share many bits...

we could filter out many filters efficiently and quickly...

Bit String Keys

1000100000 k2

1010100100 k4,k2

0110100000 k3

0011000010 k6,k2

0011101000 k5,k2

0001100100 k2

10 / 30

Page 39: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Partitioning

lots of filters share many bits...

we could filter out many filters efficiently and quickly...

Bit String Keys

1000100000 k2

1010100100 k4,k2

0110100000 k3

0011000010 k6,k2

0011101000 k5,k2

0001100100 k2

0001011100

10 / 30

Page 40: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Partitioning

lots of filters share many bits...

we could filter out many filters efficiently and quickly...

Bit String Keys

1000100000 k2

1010100100 k4,k2

0110100000 k3

0011000010 k6,k2

0011101000 k5,k2

0001100100 k2

0001011100

10 / 30

Page 41: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Partitioning

lots of filters share many bits...

we could filter out many filters efficiently and quickly...

Bit String Keys

1000100000 k2

1010100100 k4,k2

0110100000 k3

0011000010 k6,k2

0011101000 k5,k2

0001100100 k2

0001011100

10 / 30

Page 42: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Partitioning

lots of filters share many bits...

we could filter out many filters efficiently and quickly...

Bit String Keys

1000100000 k2

1010100100 k4,k2

0110100000 k3

0011000010 k6,k2

0011101000 k5,k2

0001100100 k2

0001011100

and we can do that efficiently on the cpu, while preserving

batches

10 / 30

Page 43: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Model{@POTUS,energy,policy}{@Chomsky,education}{@ggreenwald,NSA}⋆

.

.

.

input queries (stream)

q1= 010101 · · ·11

q2= 011111 · · ·01

q⋆

3= 001110 · · ·11

.

.

.

Bloom-filterencoding

⋆ “unique” query

pre

-pro

cess

CPU

0 none

1 010001 · · ·01 → P1

2001100 · · ·00 → P2001010 · · ·11 → P3001011 · · ·01 → P4

3000101 · · ·10 → P5

. . .

· · · · · ·191 . . .

partition table

su

bset

matc

h

GPU

P1

011011 · · ·01 ↔ 1010101 · · ·11 ↔ 2010101 · · ·01 ↔ 3

. . .

P2

001101 · · ·10 ↔ 62001101 · · ·01 ↔ 63001100 · · ·11 ↔ 64

. . .

.

.

.

.

.

.

tagset table

. . . ,q2

batch1 P1

. . . ,q2 ,q3

batch2 P2

. . . ,q1 ,q3

batch3 P3

.

.

.

key

loo

ku

p/r

ed

uce

CPU

1 → k1 ,k23 → k2 ,k6 ,k8

.

.

.

63 → k5 ,k8 ,k13

.

.

.

key table

q2 ,1,q2 ,3, . . .

results1

q2 ,63,q3 ,71, . . .

results2

q1 ,324,q3 ,99, . . .

results3

.

.

.

11 / 30

Page 44: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Model{@POTUS,energy,policy}{@Chomsky,education}{@ggreenwald,NSA}⋆

.

.

.

input queries (stream)

q1= 010101 · · ·11

q2= 011111 · · ·01

q⋆

3= 001110 · · ·11

.

.

.

Bloom-filterencoding

⋆ “unique” query

pre

-pro

cess

CPU

0 none

1 010001 · · ·01 → P1

2001100 · · ·00 → P2001010 · · ·11 → P3001011 · · ·01 → P4

3000101 · · ·10 → P5

. . .

· · · · · ·191 . . .

partition table

su

bset

matc

h

GPU

P1

011011 · · ·01 ↔ 1010101 · · ·11 ↔ 2010101 · · ·01 ↔ 3

. . .

P2

001101 · · ·10 ↔ 62001101 · · ·01 ↔ 63001100 · · ·11 ↔ 64

. . .

.

.

.

.

.

.

tagset table

. . . ,q2

batch1 P1

. . . ,q2 ,q3

batch2 P2

. . . ,q1 ,q3

batch3 P3

.

.

.

key

loo

ku

p/r

ed

uce

CPU

1 → k1 ,k23 → k2 ,k6 ,k8

.

.

.

63 → k5 ,k8 ,k13

.

.

.

key table

q2 ,1,q2 ,3, . . .

results1

q2 ,63,q3 ,71, . . .

results2

q1 ,324,q3 ,99, . . .

results3

.

.

.

q1 →k3 ,k13 , . . .

q2 →k1 ,k2 ,k2 ,

k6 ,k8 ,k5 ,

k8 ,k13 , . . .

q⋆

3 →k9 ,k3 ,k37 ,

k3 ,k7 , . . .

.

.

.

results (stream)

merge

CPU 11 / 30

Page 45: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

{@POTUS,energy,policy}{@Chomsky,education}{@ggreenwald,NSA}⋆

.

.

.

input queries (stream)

q1= 010101 · · ·11

q2= 011111 · · ·01

q⋆

3= 001110 · · ·11

.

.

.

Bloom-filterencoding

⋆ “unique” query

pre

-pro

cess

CPU

0 none

1 010001 · · ·01 → P1

2001100 · · ·00 → P2001010 · · ·11 → P3001011 · · ·01 → P4

3000101 · · ·10 → P5

. . .

· · · · · ·191 . . .

partition table

su

bset

matc

h

GPU

P1

011011 · · ·01 ↔ 1010101 · · ·11 ↔ 2010101 · · ·01 ↔ 3

. . .

P2

001101 · · ·10 ↔ 62001101 · · ·01 ↔ 63001100 · · ·11 ↔ 64

. . .

.

.

.

.

.

.

tagset table

. . . ,q2

batch1 P1

. . . ,q2 ,q3

batch2 P2

. . . ,q1 ,q3

batch3 P3

.

.

.

key

loo

ku

p/r

ed

uce

CPU

1 → k1 ,k23 → k2 ,k6 ,k8

.

.

.

63 → k5 ,k8 ,k13

.

.

.

key table

q2 ,1,q2 ,3, . . .

results1

q2 ,63,q3 ,71, . . .

results2

q1 ,324,q3 ,99, . . .

results3

.

.

.

q1 →k3 ,k13 , . . .

q2 →k1 ,k2 ,k2 ,

k6 ,k8 ,k5 ,

k8 ,k13 , . . .

q⋆

3 →k9 ,k3 ,k37 ,

k3 ,k7 , . . .

.

.

.

results (stream)

merge

CPU

Partitioning

12 / 30

Page 46: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Partitioning

Max size: 3

P Bit String

0

1000100000

1010000100

0110100000

0011100010

0010101000

0001101101

0000110100

0000110001

0000010110

0000001110

13 / 30

Page 47: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Partitioning

Max size: 3

P Bit String

0

1000100000

1010000100

0110100000

0011100010

0010101000

0001101101

0000110100

0000110001

0000010110

0000001110

13 / 30

Page 48: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Partitioning

Max size: 3

P Bit String

0

1000100000

1010000100

0110100000

0011100010

0010101000

0001101101

0000110100

0000110001

0000010110

0000001110

P Bit String

0

1010000100

0001101101

0000110100

0000010110

0000001110

1

1000100000

0110100000

0011100010

0010101000

0000110001

13 / 30

Page 49: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Partitioning

Max size: 3

P Bit String

0

1000100000

1010000100

0110100000

0011100010

0010101000

0001101101

0000110100

0000110001

0000010110

0000001110

P Bit String

0

1010000100

0001101101

0000110100

0000010110

0000001110

1

1000100000

0110100000

0011100010

0010101000

0000110001

13 / 30

Page 50: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Partitioning

Max size: 3

P Bit String

0

1000100000

1010000100

0110100000

0011100010

0010101000

0001101101

0000110100

0000110001

0000010110

0000001110

P Bit String

0

1010000100

0001101101

0000110100

0000010110

0000001110

1

1000100000

0110100000

0011100010

0010101000

0000110001

13 / 30

Page 51: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Partitioning

Max size: 3

P Bit String

0

1000100000

1010000100

0110100000

0011100010

0010101000

0001101101

0000110100

0000110001

0000010110

0000001110

P Bit String

0

1010000100

0001101101

0000110100

0000010110

0000001110

1

1000100000

0110100000

0011100010

0010101000

0000110001

P Bit String

00001101101

0000110100

1

1010000100

0000010110

0000001110

2

0110100000

0011100010

0010101000

31000100000

0000110001

13 / 30

Page 52: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Partitioning

P Mask Bit String

00001101101

0000110100

1

1010000100

0000010110

0000001110

2

0110100000

0011100010

0010101000

31000100000

0000110001

13 / 30

Page 53: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Partitioning

P Mask Bit String

00000100100 0001101101

0000110100

1

1010000100

0000000100 0000010110

0000001110

2

0110100000

0010100000 0011100010

0010101000

30000100000 1000100000

0000110001

13 / 30

Page 54: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

{@POTUS,energy,policy}{@Chomsky,education}{@ggreenwald,NSA}⋆

.

.

.

input queries (stream)

q1= 010101 · · ·11

q2= 011111 · · ·01

q⋆

3= 001110 · · ·11

.

.

.

Bloom-filterencoding

⋆ “unique” query

pre

-pro

cess

CPU

0 none

1 010001 · · ·01 → P1

2001100 · · ·00 → P2001010 · · ·11 → P3001011 · · ·01 → P4

3000101 · · ·10 → P5

. . .

· · · · · ·191 . . .

partition table

su

bset

matc

h

GPU

P1

011011 · · ·01 ↔ 1010101 · · ·11 ↔ 2010101 · · ·01 ↔ 3

. . .

P2

001101 · · ·10 ↔ 62001101 · · ·01 ↔ 63001100 · · ·11 ↔ 64

. . .

.

.

.

.

.

.

tagset table

. . . ,q2

batch1 P1

. . . ,q2 ,q3

batch2 P2

. . . ,q1 ,q3

batch3 P3

.

.

.

key

loo

ku

p/r

ed

uce

CPU

1 → k1 ,k23 → k2 ,k6 ,k8

.

.

.

63 → k5 ,k8 ,k13

.

.

.

key table

q2 ,1,q2 ,3, . . .

results1

q2 ,63,q3 ,71, . . .

results2

q1 ,324,q3 ,99, . . .

results3

.

.

.

q1 →k3 ,k13 , . . .

q2 →k1 ,k2 ,k2 ,

k6 ,k8 ,k5 ,

k8 ,k13 , . . .

q⋆

3 →k9 ,k3 ,k37 ,

k3 ,k7 , . . .

.

.

.

results (stream)

merge

CPU

Pre-process

14 / 30

Page 55: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Pre Process

front

end

1st bit Mask...

.

.

.

2 0010100000 → P2

40000100100 → P0

0000100000 → P3

7 0000000100 → P1

.

.

....

thread poolfooooo

partition

queues

P0

P1

P2

P3

Pn

GPU

handlers

GPUscheduler

Page 56: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Pre Process

front

end

1st bit Mask...

.

.

.

2 0010100000 → P2

40000100100 → P0

0000100000 → P3

7 0000000100 → P1

.

.

....

thread poolfooooo

partition

queues

P0

P1

P2

P3

Pn

GPU

handlers

GPUscheduler

q0

q0

q0

Page 57: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Pre Process

front

end

1st bit Mask...

.

.

.

2 0010100000 → P2

40000100100 → P0

0000100000 → P3

7 0000000100 → P1

.

.

....

thread poolfooooo

partition

queues

P0

P1

P2

P3

Pn

GPU

handlers

GPUscheduler

q1

q1

q1

q1

q0 q1

Page 58: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Pre Process

front

end

1st bit Mask...

.

.

.

2 0010100000 → P2

40000100100 → P0

0000100000 → P3

7 0000000100 → P1

.

.

....

thread poolfooooo

partition

queues

P0

P1

P2

P3

Pn

GPU

handlers

GPUscheduler

q2

q2

q2

q1

q2

q0 q1 q2

15 / 30

Page 59: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Pre Process

front

end

1st bit Mask...

.

.

.

2 0010100000 → P2

40000100100 → P0

0000100000 → P3

7 0000000100 → P1

.

.

....

thread poolfooooo

partition

queues

P0

P1

P2

P3

Pn

GPU

handlers

GPUscheduler

q1

q2

q0 q1 q2

flush

15 / 30

Page 60: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Pre Process

front

end

1st bit Mask...

.

.

.

2 0010100000 → P2

40000100100 → P0

0000100000 → P3

7 0000000100 → P1

.

.

....

thread poolfooooo

partition

queues

P0

P1

P2

P3

Pn

GPU

handlers

GPUscheduler

q1

q2

Timeout expired!

15 / 30

Page 61: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Pre Process

front

end

1st bit Mask...

.

.

.

2 0010100000 → P2

40000100100 → P0

0000100000 → P3

7 0000000100 → P1

.

.

....

thread poolfooooo

partition

queues

P0

P1

P2

P3

Pn

GPU

handlers

GPUscheduler

q1

q2

flush

15 / 30

Page 62: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Optimization

16 / 30

Page 63: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

GPU Optimization

q0 q1 q2 q3 q4 . . . q255Kernel

Block 0 Block 1 Block 2

Block 3 Block 4 Block 5

Block 6 Block . . . Block n

tagset

table

s0

s1

s2

.

.

.

.

.

.

sn−2

sn−1

sn

17 / 30

Page 64: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

GPU Optimization

Kernel q0 q1 q2 q3 q4 . . . q255

Block 0

t255 | 1110010100

. . . | . . .

t2 | 1110100000

t1 | 1110110000

t0 | 1110110110

Block 1

t255 | 0011101101

. . . | . . .

t2 | 0101101011

t1 | 0110001110

t0 | 0110010110

17 / 30

Page 65: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

GPU OptimizationPhase 1

Kernel q0 q1 q2 q3 q4 . . . q255

Block

Thread 0

Thread 3

idle

Thread 1

idle

Thread n

idle

Thread 2

idlefirst = 1110110110

last = 1110010100

17 /

Page 66: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

GPU OptimizationPhase 1

Kernel q0 q1 q2 q3 q4 . . . q255

Block

Thread 0

Thread 3

idle

Thread 1

idle

Thread n

idle

Thread 2

idlefirst = 1110110110

last = 1110010100

first ⊕ last = 0000100010

17 /

Page 67: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

GPU OptimizationPhase 1

Kernel q0 q1 q2 q3 q4 . . . q255

Block

Thread 0

Thread 3

idle

Thread 1

idle

Thread n

idle

Thread 2

idlefirst = 1110110110

last = 1110010100

first ⊕ last = 0000100010

prefix = 1110000000

common prefix = 1110000000

17 / 30

Page 68: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

GPU OptimizationPhase 2

Kernel q0 q1 q2 q3 q4 . . . q255

Block

Thread 0

Thread 3

prefix ⊆ q3?

Thread 1

prefix ⊆ q1?

Thread n

prefix ⊆ qn?

Thread 2

prefix ⊆ q2?

common prefix = 1110000000

prefix ⊆ q0?

Q =

17 / 30

Page 69: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

GPU OptimizationPhase 2

Kernel q0 q1 q2 q3 q4 . . . q255

Block

Thread 0

Thread 3

V

Thread 1

V

Thread n

?

Thread 2

X

common prefix = 1110000000

V

q1 q3 q21 q0 q200q177Q =

17 / 30

Page 70: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

GPU OptimizationPhase 3

Kernel q0 q1 q2 q3 q4 . . . q255

Block

Thread 0

Thread 3

for (qi ∈ Q)

if (f ⊆ qi )

results.add(qi )

Thread 1

for (qi ∈ Q)

if (f ⊆ qi )

results.add(qi )

Thread n

for (qi ∈ Q)

if (f ⊆ qi )

results.add(qi )

Thread 2

for (qi ∈ Q)

if (f ⊆ qi )

results.add(qi )

common prefix = 1110000000

for (qi ∈ Q)

if (f ⊆ qi )

results.add(qi )

q1 q3 q21 q0 q200q177Q =

17 / 30

Page 71: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Workflow Optimization

18 / 30

Page 72: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Workflow Optimization

run kernel

Size

3 q7,q21,q1

Data

GPU

CPU

Size Data

18 / 30

Page 73: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Workflow Optimization

run kernel

Size

3 q7,q21,q1

Data

GPU

CPU

Size Data

copy res size

Page 74: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Workflow Optimization

run kernel

Size

3 q7,q21,q1

Data

GPU

CPU

Size

3

Data

copy res size

syn

c

18 / 30

Page 75: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Workflow Optimization

run kernel

Size

3 q7,q21,q1

Data

GPU

CPU

Size

3

Data

copy res size

syn

c

copy res data

18 / 30

Page 76: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Workflow Optimization

run kernel

Size

3 q7,q21,q1

Data

GPU

CPU

Size

3

Data

copy res size

syn

c

copy res data

18 / 30

Page 77: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Workflow Optimization

run kernel

Size

3 q7,q21,q1

Data

GPU

CPU

Size

3 q7,q21,q1

Data

copy res size

syn

c

copy res data

syn

c

18 / 30

Page 78: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Workflow Optimization

run kernel

Size

3 q7,q21,q1

Data

GPU

CPU

Size

3 q7,q21,q1

Data

copy res size

syn

c

copy res data

syn

cprocess res

18 / 30

Page 79: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Workflow Optimization

run kernel

copy all res

process ressyn

c

Size Data

GPU

CPU

Size Data

18 / 30

Page 80: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Workflow Optimization

GPU

CPU

Size Data

Size Data

18 / 30

Page 81: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Workflow Optimization

GPU

CPU

Size Data

q207,q17

Size Data

Size Data

Size

2

Data

Page 82: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Workflow Optimization

GPU

CPU

Size

3

Data

q207,q17

Size Data

Size Data

q7,q21,q1

Size

2

Data

run kernel

Page 83: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Workflow Optimization

GPU

CPU

Size

3

Data

q207,q17

Size Data

Size Data

q7,q21,q1

Size

2

Data

run kernel

copy res

Page 84: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Workflow Optimization

GPU

CPU

Size

3

Data

q207,q17

Size

3

Data

q207,q17

Size Data

q7,q21,q1

Size

2

Data

run kernel

copy res

syn

c

Page 85: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Workflow Optimization

GPU

CPU

Size

3

Data

q207,q17

Size

3

Data

q207,q17

Size Data

q7,q21,q1

Size

2

Data

run kernel

copy res

syn

c

process res

Page 86: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Workflow Optimization

GPU

CPU

Size

3

Data

q87,q12,q1,q5

Size

3

Data

q207,q17

Size

4

Data

q7,q21,q1

Size

2

Data

run kernel

copy res

syn

c

process res

run kernel

Page 87: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Workflow Optimization

GPU

CPU

Size

3

Data

q87,q12,q1,q5

Size

3

Data

q207,q17

Size

4

Data

q7,q21,q1

Size

2

Data

run kernel

copy res

syn

c

process res

run kernel

copy res

Page 88: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Workflow Optimization

GPU

CPU

Size

3

Data

q87,q12,q1,q5

Size

3

Data

q207,q17

Size

4

Data

q7,q21,q1

Size

4

Data

q7,q21,q1

run kernel

copy res

syn

c

process res

run kernel

copy ressyn

c

Page 89: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Workflow Optimization

GPU

CPU

Size

3

Data

q87,q12,q1,q5

Size

3

Data

q207,q17

Size

4

Data

q7,q21,q1

Size

4

Data

q7,q21,q1

run kernel

copy res

syn

c

process res

run kernel

copy ressyn

c

process res18 / 30

Page 90: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Workflow Optimization

run kernel

copy res size

copy res data

process res

syn

csyn

c

run kernel

copy all res

process res

syn

c

run kernel

copy res

process res

run kernel

copy res

process res

syn

csyn

c

18 / 30

Page 91: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Evaluation

19 / 30

Page 92: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Evaluation

1 single machine

24 (48) physical (virtual) cpu cores

2 Nvidia Titan X

19 / 30

Page 93: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Scalability

1

10

100

20 30 40 50 60 70 80 90 100

Thr

ough

put

(tho

usan

d qu

erie

s/s)

Database size (% of the full Twitter database)

TagMatch, matchTagMatch, match-unique

Does it scale with bigger databases?

20 / 30

Page 94: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Scalability

1

10

100

20 30 40 50 60 70 80 90 100

Thr

ough

put

(tho

usan

d qu

erie

s/s)

Database size (% of the full Twitter database)

TagMatch, matchTagMatch, match-unique

20 / 30

Page 95: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Scalability

1

10

100

20 30 40 50 60 70 80 90 100

Thr

ough

put

(tho

usan

d qu

erie

s/s)

Database size (% of the full Twitter database)

TagMatch, matchTagMatch, match-uniqueprefix tree, matchprefix tree, match-unique

20 / 30

Page 96: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Scalability

1

10

100

20 30 40 50 60 70 80 90 100

Thr

ough

put

(tho

usan

d qu

erie

s/s)

Database size (% of the full Twitter database)

TagMatch, matchTagMatch, match-uniqueprefix tree, matchprefix tree, match-unique

Twitter

20 / 30

Page 97: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Threads

0

10

20

30

40

50

8 16 24 32 40 48

Thr

ough

put

(tho

usan

d qu

erie

s/s)

Number of threads

TagMatch, matchTagMatch, match-unique

prefix tree, matchprefix tree, match-unique

Does it scale with bigger machines?

21 / 30

Page 98: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Threads

0

10

20

30

40

50

8 16 24 32 40 48

Thr

ough

put

(tho

usan

d qu

erie

s/s)

Number of threads

TagMatch, matchTagMatch, match-unique

prefix tree, matchprefix tree, match-unique

21 / 30

Page 99: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Threads

0

10

20

30

40

50

8 16 24 32 40 48

Thr

ough

put

(tho

usan

d qu

erie

s/s)

Number of threads

TagMatch, matchTagMatch, match-unique

prefix tree, matchprefix tree, match-unique

GPU limit!

21 / 30

Page 100: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Latency

0

0.5

1

1.5

2

2.5

3

3.5

4

200 400 600 800 no limit

Late

ncy

(s)

Timeout (ms)

1%, 25%, median, 75%, 99%maximum

Does batching kill latency?

22 / 30

Page 101: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Latency

0

0.5

1

1.5

2

2.5

3

3.5

4

200 400 600 800 no limit

Late

ncy

(s)

Timeout (ms)

1%, 25%, median, 75%, 99%maximum

22 / 30

Page 102: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Memory usage

5

10

15

20

25

30

0 20 40 60 80 100

Mem

ory

usag

e(G

B)

Database size (% of the full Twitter database)

GPU, I/O buffersGPU, tagset table

Host

How much memory does it need?

23 / 30

Page 103: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Memory usage

5

10

15

20

25

30

0 20 40 60 80 100

Mem

ory

usag

e(G

B)

Database size (% of the full Twitter database)

GPU, I/O buffersGPU, tagset table

Host

23 / 30

Page 104: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Conclusion

subset matching

24 / 30

Page 105: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Conclusion

subset matching◮ computationally complex◮ highly parallelizable

24 / 30

Page 106: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Conclusion

subset matching◮ computationally complex◮ highly parallelizable

TagMatch

24 / 30

Page 107: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Conclusion

subset matching◮ computationally complex◮ highly parallelizable

TagMatch◮ implements an efficient CPU/GPU pipeline

24 / 30

Page 108: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Conclusion

subset matching◮ computationally complex◮ highly parallelizable

TagMatch◮ implements an efficient CPU/GPU pipeline

https://github.com/carzaniga/TagMatch

24 / 30

Page 109: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

High-Throughput Subset Matching on

Commodity GPU-Based Systems

Daniele Rogora∗ Michele Papalini$ Koorosh Khazaei∗

Alessandro Margara% Antonio Carzaniga∗ Gianpaolo Cugola%

presented by

Daniele Rogora

%Politecnico di Milano ∗Università della Svizzera italiana $Cisco Systems

Milano Lugano Paris

Italy Switzerland France

EuroSys 2017

25 / 30

Page 110: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Partition size

0

5

10

15

20

25

30

35

40

0 100 200 300 400 500 600 700 800 900

Thr

ough

put

(tho

usan

d qu

erie

s/s)

MAXP: Maximum size of partitions (thousands)

matchmatch-unique

26 / 30

Page 111: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Mongo DB

10-1

100

101

102

103

104

105

106

4 5 6 7 8 9 10

Thr

ough

put

(que

ries/

s)

Number of tags per query

TagMatch 1MTagMatch 3MTagMatch 5M

MongoDB 1MMongoDB 3MMongoDB 5M

27 / 30

Page 112: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Partitioning time

0

10

20

30

40

50

10 20 30 40 50 60 70 80 90 100

Tim

e (s

)

Database size (% of the full Twitter database)

balanced partitioning

28 / 30

Page 113: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

More tags

0.1

1

10

100

1000

0 1 2 3 4 5 6 7 8 9

Thr

ough

put

(tho

usan

d qu

erie

s/s)

Number of additional tags per query

TagMatchprefix tree

100

1000

10000

100000

0 1 2 3 4 5 6 7 8 9

Out

put t

hrou

ghpu

t(t

hous

and

keys

/s)

Number of additional tags per query

TagMatchprefix tree

29 / 30

Page 114: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Descriptors Representation

Representation of tagsets with Bloom filters

a bitvector of size m

k independent hash functions h1, . . . ,hk

hi : Tags →{1, . . . ,m}

Example: (k = 2,m = 10)

1 2 3 4 5 6 7 8 9 10

h1

h2

D = {politics, Italy, USA} 1 111 1

Concretely, in our implementation: m = 192,k = 7

False positives: testing S1 ⊆ S2 with Bloom fil-

ters gives a false positive with probability 1 −

e−k |S2|mk |S1\S2|

For example, when |S2| = 10 and |S1 \S2| = 3, we

have a false positive with probability 10−11

30 / 30

Page 115: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Descriptors Representation

Representation of tagsets with Bloom filters

a bitvector of size m

k independent hash functions h1, . . . ,hk

hi : Tags →{1, . . . ,m}

Example: (k = 2,m = 10)

1 2 3 4 5 6 7 8 9 10

h1

h2

D = {politics, Italy, USA} 1 111 1

Concretely, in our implementation: m = 192,k = 7

False positives: testing S1 ⊆ S2 with Bloom fil-

ters gives a false positive with probability 1 −

e−k |S2|mk |S1\S2|

For example, when |S2| = 10 and |S1 \S2| = 3, we

have a false positive with probability 10−11

30 / 30

Page 116: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Descriptors Representation

Representation of tagsets with Bloom filters

a bitvector of size m

k independent hash functions h1, . . . ,hk

hi : Tags →{1, . . . ,m}

Example: (k = 2,m = 10)

1 2 3 4 5 6 7 8 9 10

h1

h2

D = {politics, Italy, USA} 1 111 1

Concretely, in our implementation: m = 192,k = 7

False positives: testing S1 ⊆ S2 with Bloom fil-

ters gives a false positive with probability 1 −

e−k |S2|mk |S1\S2|

For example, when |S2| = 10 and |S1 \S2| = 3, we

have a false positive with probability 10−11

30 / 30

Page 117: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Descriptors Representation

Representation of tagsets with Bloom filters

a bitvector of size m

k independent hash functions h1, . . . ,hk

hi : Tags →{1, . . . ,m}

Example: (k = 2,m = 10)

1 2 3 4 5 6 7 8 9 10

h1

h2

D = {politics, Italy, USA} 1 111 1

Concretely, in our implementation: m = 192,k = 7

False positives: testing S1 ⊆ S2 with Bloom fil-

ters gives a false positive with probability 1 −

e−k |S2|mk |S1\S2|

For example, when |S2| = 10 and |S1 \S2| = 3, we

have a false positive with probability 10−11

30 / 30

Page 118: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Descriptors Representation

Representation of tagsets with Bloom filters

a bitvector of size m

k independent hash functions h1, . . . ,hk

hi : Tags →{1, . . . ,m}

Example: (k = 2,m = 10)

1 2 3 4 5 6 7 8 9 10

D = {politics, Italy, USA} 1 111 1

Concretely, in our implementation: m = 192,k = 7

False positives: testing S1 ⊆ S2 with Bloom fil-

ters gives a false positive with probability 1 −

e−k |S2|mk |S1\S2|

For example, when |S2| = 10 and |S1 \S2| = 3, we

have a false positive with probability 10−11

30 / 30

Page 119: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Descriptors Representation

Representation of tagsets with Bloom filters

a bitvector of size m

k independent hash functions h1, . . . ,hk

hi : Tags →{1, . . . ,m}

Example: (k = 2,m = 10)

1 2 3 4 5 6 7 8 9 10

D = {politics, Italy, USA} 1 111 1

Concretely, in our implementation: m = 192,k = 7

False positives: testing S1 ⊆ S2 with Bloom fil-

ters gives a false positive with probability 1 −

e−k |S2|mk |S1\S2|

For example, when |S2| = 10 and |S1 \S2| = 3, we

have a false positive with probability 10−11

30 / 30

Page 120: High-Throughput Subset Matching on Commodity GPU-Based … · Subset Match Useful in many scenarios Social networks, Twitter Data Center management Service brokering 2/30

Descriptors Representation

Representation of tagsets with Bloom filters

a bitvector of size m

k independent hash functions h1, . . . ,hk

hi : Tags →{1, . . . ,m}

Example: (k = 2,m = 10)

1 2 3 4 5 6 7 8 9 10

D = {politics, Italy, USA} 1 111 1

Concretely, in our implementation: m = 192,k = 7

False positives: testing S1 ⊆ S2 with Bloom fil-

ters gives a false positive with probability 1 −

e−k |S2|mk |S1\S2|

For example, when |S2| = 10 and |S1 \S2| = 3, we

have a false positive with probability 10−11

30 / 30