32
Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited Cagri Balkesen, Gustavo Alonso, Jens Teubner, M. Tamer ¨ Ozsu Presented by William Qian 2020 April 16 6.886 Spring 2020 William Qian Sort vs. Hash 2020 April 16 1 / 32

Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

Multi-Core, Main-Memory Joins: Sort vs. Hash

Revisited

Cagri Balkesen, Gustavo Alonso, Jens Teubner, M. Tamer Ozsu

Presented by William Qian

2020 April 166.886 Spring 2020

William Qian Sort vs. Hash 2020 April 16 1 / 32

Page 2: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

Overview

1 Background

2 Parallel sort-merge joins

3 Parallel hash joins

4 Evaluation

5 Discussion

William Qian Sort vs. Hash 2020 April 16 2 / 32

Page 3: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

Background

1 Background

2 Parallel sort-merge joins

3 Parallel hash joins

4 Evaluation

5 Discussion

William Qian Sort vs. Hash 2020 April 16 3 / 32

Page 4: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

Background

Sort-merge joins

SELECT * FROM R, S WHERE F(R.key) = G(S.key)

Sort phase: sort R ’s keys according to F and S ’s keys according to GMerge phase: mergesort-style matching of keys from R and S

Works for any comparator

Requires sorting

Sorting is known to be parallelizable

Merging is much harder to parallelize

William Qian Sort vs. Hash 2020 April 16 4 / 32

Page 5: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

Background

Hash joins

SELECT * FROM R, S WHERE F(R.key) = G(S.key)

Build phase: create base hashtable H from applying F to keys of RProbe phase: apply G to keys in S and find matches in H to join

Embarrassingly parallel

Requires lots of memory to store H

Frequently incurs cache misses for large tables

Requires equijoins (which are fairly common)

William Qian Sort vs. Hash 2020 April 16 5 / 32

Page 6: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

Background

Non-uniform memory access

P1

P2

P3

P4

M1 M2

P1 can access M1 easily, but M2 is a little more costly

Lots of data movement to “farther” memory increasesbandwidth congestion

William Qian Sort vs. Hash 2020 April 16 6 / 32

Page 7: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

Parallel sort-merge joins

1 Background

2 Parallel sort-merge joins

3 Parallel hash joins

4 Evaluation

5 Discussion

William Qian Sort vs. Hash 2020 April 16 7 / 32

Page 8: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

Parallel sort-merge joins

Parallel run-generation

Sorting networks

Few data dependencies

No branching

Only sorts across vectors

e = min(a, b)

f = max(a, b)

g = min(c, d)

h = max(c, d)

i = min(e, g)

j = min(f, h)

w = min(e, g)

x = min(i, j)

y = max(i, j)

z = max(f, g)

William Qian Sort vs. Hash 2020 April 16 8 / 32

Page 9: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

Parallel sort-merge joins

Parallel run-generation

5 28 1 13 20 8 2 15 12 14 19 17 6 7 22 21

5 7 1 13 6 8 2 15 12 14 19 17 20 28 22 21

5 6 12 20 7 8 14 28 1 2 19 22 13 15 17 21

a

b

Sorting network in (a) generates vectors sorted across positions

Shuffling in (b) transposes vectors so that each vector is sorted

William Qian Sort vs. Hash 2020 April 16 9 / 32

Page 10: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

Parallel sort-merge joins

Parallel merge

Bitonic merge networks

Scales poorly

Used as a kernel sort

Adds branch predictions

Avoids scalar-vector registermovement

William Qian Sort vs. Hash 2020 April 16 10 / 32

Page 11: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

Parallel sort-merge joins

Out-of-cache sorting

Multi-way merging

Two-way merge unitsconnected with FIFO buffers

External memory bandwidthonly at front of multi-waymerge tree

Helps combat NUMA

William Qian Sort vs. Hash 2020 April 16 11 / 32

Page 12: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

Parallel sort-merge joins

Sort-merge: choose your fighter

m-way

NUMA-local partitions

Tables sorted symmetrically

Multiway merging for

Single-pass merge join

m-pass

Similar to m-pass

Two-way bitonic merginginstead of multiway merging

mpsm

Globally partitions & sortsone table

Partially sorts the other table

Keys in S are a subset of keysin R

First table merged w/ NUMAremote runs of second table

William Qian Sort vs. Hash 2020 April 16 12 / 32

Page 13: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

Parallel hash joins

Radix partitioning

Problem: large hashtables result in many cache missesSolution: radix partitioning

Moves tuples to destination partitions (pages)

Reduces TLB miss effects during partitioning

TLB size limits the fan-out of the partitioning step

William Qian Sort vs. Hash 2020 April 16 13 / 32

Page 14: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

Parallel hash joins

Software-managed buffers

Problem: radix partitioning is limited by TLB sizesSolution: buffer writes in cache

Extra copy step

TLB fetch only needed once every N tuples in a partition

More I/O reordering due to buffered writes & less TLB pressure

Cache line-sized buffers can enable blind writes, which are faster

William Qian Sort vs. Hash 2020 April 16 14 / 32

Page 15: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

Parallel hash joins

Hash: choose your fighter

radix

Parallel radix-hash join

Partitioned according toradix-hash

Cache-local hash joins onpartition pairs

n-part

Emabarrassingly-parallelizedhash join

Tables sharded/striped acrossworkers

Build a shared hashtablebased on one table

Hash-and-match with thesecond table

William Qian Sort vs. Hash 2020 April 16 15 / 32

Page 16: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

Evaluation

1 Background

2 Parallel sort-merge joins

3 Parallel hash joins

4 Evaluation

5 Discussion

William Qian Sort vs. Hash 2020 April 16 16 / 32

Page 17: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

Evaluation

Setup

Benchmarks:

m-way (sort-merge)

m-pass (sort-merge)

mpsm (sort-merge)

radix (hash)

n-part (hash)

Workloads:

Column-store

4-byte keys and values, allintegers

Keys in S are a proper subsetof keys in R

Generally uniform keydistribution in S

William Qian Sort vs. Hash 2020 April 16 17 / 32

Page 18: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

Evaluation

Environment

256-bit AVX (floating-point only)

64 threads = 4 sockets, 8 cores/socket, hyperthreading enabled

L1/L2/L3 cache sizes: 32KiB/256KiB/20MiB

L3 is socket-local

Cache line size: 64B

TLB1: 64 entries for 64KiB pages; 32 entries for 2MiB pages

TLB2: page size 4KiB, 512 entries per TLB1 entry

William Qian Sort vs. Hash 2020 April 16 18 / 32

Page 19: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

Evaluation

Experiments

Sorting baseline Merging baseline Partitioning

Alternative merges m-way factors Input size

Data skew Scalability

William Qian Sort vs. Hash 2020 April 16 19 / 32

Page 20: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

Evaluation

Sorting baselines

Evaluating single-threadedperformance

Confirm that AVX sorting isefficient

William Qian Sort vs. Hash 2020 April 16 20 / 32

Page 21: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

Evaluation

Merging

Larger merging fan-ins lead tosmaller buffers

Software managed buffersperform stably

Idea: partition instead ofmerge

William Qian Sort vs. Hash 2020 April 16 21 / 32

Page 22: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

Evaluation

Merging

Partition-then-sort: range-partition, sort, concatenate

Sort-then-merge: what we’ve been discussing

Partitioning doesn’t degrade like merging does!

William Qian Sort vs. Hash 2020 April 16 22 / 32

Page 23: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

Evaluation

Sort-merge champion: m-way

Multi-way merge helps whenmemory is contended

AVX benefit is persistent

William Qian Sort vs. Hash 2020 April 16 23 / 32

Page 24: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

Evaluation

Hash champion: radix-hash

Radix-hash with software-managed buffers [2]

William Qian Sort vs. Hash 2020 April 16 24 / 32

Page 25: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

Evaluation

Sort vs. Hash: Input size

Radix-hash wins at smallersizes

Radix-hash degrades quicklywith larger sizes

m-way doesn’t degrade withtable size, but

m-way performs ≈radix-hashat best

William Qian Sort vs. Hash 2020 April 16 25 / 32

Page 26: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

Evaluation

Sort vs. Hash: Skew

Radix-hash

Fine-granular taskdecomposition [2, 3]

Redistributes “hotter”partitions to all threads

m-way

Multi-way merging’s two-stepapproach:

1 Sub-task merges, split inNUMA region

2 Special handling for heavyhitters

William Qian Sort vs. Hash 2020 April 16 26 / 32

Page 27: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

Evaluation

Sort vs. Hash: Scalability

Sort-merge algorithms allscale

Radix-hash scales as well

William Qian Sort vs. Hash 2020 April 16 27 / 32

Page 28: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

Evaluation

Sort vs. Hash

Radix-hash works well

m-way is about similar forlarger joins

Hash joins are still the winners

William Qian Sort vs. Hash 2020 April 16 28 / 32

Page 29: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

Evaluation

References

Cagri Balkesen, Gustavo Alonso, Jens Teubner, and M Tamer Ozsu.

Multi-core, main-memory joins: Sort vs. hash revisited.Proceedings of the VLDB Endowment, 7(1):85–96, 2013.

Cagri Balkesen, Jens Teubner, Gustavo Alonso, and M Tamer Ozsu.

Main-memory hash joins on multi-core cpus: Tuning to the underlying hardware.In 2013 IEEE 29th International Conference on Data Engineering (ICDE), pages 362–373. IEEE, 2013.

Changkyu Kim, Tim Kaldewey, Victor W Lee, Eric Sedlar, Anthony D Nguyen, Nadathur Satish, Jatin Chhugani, Andrea

Di Blas, and Pradeep Dubey.Sort vs. hash revisited: Fast join implementation on modern multi-core cpus.Proceedings of the VLDB Endowment, 2(2):1378–1389, 2009.

William Qian Sort vs. Hash 2020 April 16 29 / 32

Page 30: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

Discussion

1 Background

2 Parallel sort-merge joins

3 Parallel hash joins

4 Evaluation

5 Discussion

William Qian Sort vs. Hash 2020 April 16 30 / 32

Page 31: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

Discussion

Feedback

Positive:

Paper layout is very readable!

Lots of appropriate data visuals

Thorough work on minimizing effects of external factors

Good balance of self and cross-system comparisons

Constructive:

Throughput vs execution time graphs can be confusing

Hyperthread scaling cap for memory-restricted workloads iswell-known

Generally should avoid benchmarking with hyperthreads

William Qian Sort vs. Hash 2020 April 16 31 / 32

Page 32: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

Discussion

Discussion

1 How could multi-way merging benefit from advances with(parallel) funnelsort?

2 How would a non-NUMA architecture affect these results?3 How could these results translate to other database data

layouts?

Delta encodingsBit vector layouts

William Qian Sort vs. Hash 2020 April 16 32 / 32