65

Multi-Way Hash Join Effectiveness

  • Upload
    verity

  • View
    53

  • Download
    0

Embed Size (px)

DESCRIPTION

Multi-Way Hash Join Effectiveness. M.Sc Thesis Michael Henderson Supervisor Dr. Ramon Lawrence. Outline. Motivation Database Terminology Background Joins Multi-Way Joins Thesis Questions Experimental Results Conclusions. Motivation. Data is everywhere - PowerPoint PPT Presentation

Citation preview

Page 1: Multi-Way Hash Join Effectiveness
Page 2: Multi-Way Hash Join Effectiveness

2

Multi-Way Hash Join Effectiveness

M.Sc ThesisMichael Henderson

Supervisor Dr. Ramon Lawrence

Page 3: Multi-Way Hash Join Effectiveness

3

Outline

• Motivation• Database Terminology• Background• Joins• Multi-Way Joins• Thesis Questions• Experimental Results• Conclusions

Page 4: Multi-Way Hash Join Effectiveness

4

Motivation

• Data is everywhere• Governments collect data on citizens• Facebook collects data on over 1 billion people• Wal-Mart and Target collect sales data on all their customers

• The goal is to make answering the big questions–Possible–Faster

Page 5: Multi-Way Hash Join Effectiveness

5

Database Terminology: Relations (Tables)

Part Lineitempartkey name retailprice linenumber partkey quantity saleprice1 Box 0.50 1 1 1 0.502 Hat 25.00 2 1 1 0.503 Bottle 2.50 3 2 3 22.50

4 3 15 2.50

Part Relation

Tuple/Row

Attribute/Column

Lineitem Relation

The tables are related through their partkey attributes

Attribute Names

Page 6: Multi-Way Hash Join Effectiveness

6

Database Terminology II: SQL

• Structured Query Language• Used to ask the database questions about the data• Standardized• Example: SQL for retrieving all rows from the part table

SELECT * FROM Part;

Page 7: Multi-Way Hash Join Effectiveness

7

Database Terminology III: Join

• Joins are used to combine the data in database tables• Joins are slow• We want joins to be faster

Page 8: Multi-Way Hash Join Effectiveness

8

Background

Page 9: Multi-Way Hash Join Effectiveness

9

What Makes Queries Slow?

• All the data must be read to give an accurate answer• Data is usually much larger than what can fit in memory• Operations such as filtering, ordering, and joins are

costly• A join is especially costly

– May need to match every row in two tables. O(n2)– May need to perform many slow disk operations (I/Os)

Page 10: Multi-Way Hash Join Effectiveness

10

Background: Example Join QuerySELECT * FROM Part p, Lineitem lWHERE p.partkey = l.partkey;

Part Lineitem

p.partkey = l.partkey

Resultspartkey name retailprice linenumber partkey quantity saleprice1 Box 0.50 1 1 1 0.501 Box 0.50 2 1 1 0.502 Hat 25.00 3 2 3 22.503 Bottle 2.50 4 3 15 2.50

SQL

Relational Algebra

Join Results

Page 11: Multi-Way Hash Join Effectiveness

11

Resultspartkey name retailprice linenumber partkey quantity saleprice

Nested Loop JoinPart Lineitempartkey name retailprice linenumber partkey quantity saleprice1 Box 0.50 1 1 1 0.502 Hat 25.00 2 1 1 0.503 Bottle 2.50 3 2 3 22.50

4 3 15 2.50

Resultspartkey name retailprice linenumber partkey quantity saleprice

1 Box 0.50 1 1 1 0.501 Box 0.50 2 1 1 0.502 Hat 25.00 3 2 3 22.503 Bottle 2.50 4 3 15 2.50

1 Box 0.50 1 1 1 0.501 Box 0.502 1 1 0.50

1 Box 0.50

3 2 3 22.504 3 15 2.50

Page 12: Multi-Way Hash Join Effectiveness

12

Dynamic Hash Join

Partpartkey name retailprice1 Box 0.502 Hat 25.003 Bottle 2.503 Bottle 2.502 Hat 25.001 Box 0.50

Part1partkey name retailprice

Part2partkey name retailprice

Part3partkey name retailprice

Three Part PartitionsHash Function: partition = (partkey - 1 mod 3) + 1

= (1 - 1 mod 3) + 1 = 1= (2 - 1 mod 3) + 1 = 2= (3 - 1 mod 3) + 1 = 3

Saved to disk

Page 13: Multi-Way Hash Join Effectiveness

13

Part1partkey name retailprice1 Box 0.50

Resultspartkey name retailprice linenumber partkey quantity saleprice

Dynamic Hash Join

Lineitemlinenumber partkey quantity saleprice1 1 1 0.502 1 1 0.503 2 3 22.504 3 15 2.50

1 Box 0.50 1 1 1 0.501 Box 0.502 1 1 0.503 2 3 22.504 3 15 2.50

Lineitem1linenumber partkey quantity saleprice

Lineitem2linenumber partkey quantity saleprice

Lineitem3linenumber partkey quantity saleprice

Three Lineitem Partitions

Hash Function: partition = (partkey - 1 mod 3) + 1= (1 - 1 mod 3) + 1 = 1= (2 - 1 mod 3) + 1 = 2= (3 - 1 mod 3) + 1 = 3

Page 14: Multi-Way Hash Join Effectiveness

14

Part2partkey name retailprice2 Hat 25.00

Resultspartkey name retailprice linenumber partkey quantity saleprice

1 Box 0.50 1 1 1 0.501 Box 0.50 2 1 1 0.50

Dynamic Hash Join

Lineitem2linenumber partkey quantity saleprice3 2 3 22.50

Resultspartkey name retailprice linenumber partkey quantity saleprice

1 Box 0.50 1 1 1 0.501 Box 0.50 2 1 1 0.502 Hat 25.00 3 2 3 22.503 Bottle 2.50 4 3 15 2.50

2 Hat 25.00 3 2 3 22.50

Page 15: Multi-Way Hash Join Effectiveness

15

Join Three TablesSELECT A.a_key, B.b_key, C.c_key FROM A, B, CWHERE A.a_key = B.a_key AND A.a_key = C.a_key;

A B

A.a_key = B.a_keyC

A.a_key = C.a_key

A B

A.a_key = B.a_keyC

A.a_key = C.a_key

Left Deep Plan Right Deep Plan

Page 16: Multi-Way Hash Join Effectiveness

16

Multi-way Hash Joins

• Join multiple relations at the same time• Shares memory across the entire join• Produces a result by combining tuples from all relations• Do not have to repartition intermediate results• Less disk operations

A B

A.a_key = B.a_key and A.a_key = C.a_key

C

Multi-way Plan

Page 17: Multi-Way Hash Join Effectiveness

17

Hash Teams

• Multi-way hash join• Hash teams joins relations on a common attribute

Page 18: Multi-Way Hash Join Effectiveness

18

Hash Teams Example

A B Ca_key b_key a_key c_key a_key

1 1 1 1 32 2 2 2 13 3 3 3 2

4 1 4 25 2 5 1

SELECT A.a_key, B.b_key, C.c_key FROM A, B, CWHERE A.a_key = B.a_key AND A.a_key = C.a_key;

Page 19: Multi-Way Hash Join Effectiveness

19

Partitioning A and B

A1

a_key1

Partitions

A Ba_key b_key a_key1 1 12 2 23 3 3

4 15 2

Hash Function: partition = (a_key - 1 mod 3) + 1

= (1 - 1 mod 3) + 1 = 1= (2 - 1 mod 3) + 1 = 2= (3 - 1 mod 3) + 1 = 3

1A2

a_key2

A3

a_key3

A1

a_key

A2

a_key

A3

a_key

23

B1

b_key a_key

B2

b_key a_key

B3

b_key a_key

Page 20: Multi-Way Hash Join Effectiveness

20

Partitioning A and B

A1

a_key1

Partitions

A Ba_key b_key a_key1 1 12 2 23 3 3

4 15 2

Hash Function: partition = (a_key - 1 mod 3) + 1= (1 - 1 mod 3) + 1 = 1= (2 - 1 mod 3) + 1 = 2= (3 - 1 mod 3) + 1 = 3

A2

a_key2

A3

a_key3

1 12 23 34 15 2

B1

b_key a_key

1 14 1

B2

b_key a_key

2 25 2

B3

b_key a_key

3 3

B1

b_key a_key

B2

b_key a_key

B3

b_key a_key

Page 21: Multi-Way Hash Join Effectiveness

21

Processing C

A1

a_key1

Disk Partitions

Hash Function: partition = (a_key - 1 mod 3) + 1

B1

b_key a_key

1 14 1

B1

b_key a_keyCc_key a_key

1 32 13 24 25 1

1 32 13 24 2

1 1 14 1

C2

c_key a_key

C3

c_key a_key

Resultsa_key b_key c_key

2 1412 1

Page 22: Multi-Way Hash Join Effectiveness

22

Processing C

A1

a_key1

Disk Partitions

Hash Function: partition = (a_key - 1 mod 3) + 1

B1

b_key a_key

1 14 1

B1

b_key a_keyCc_key a_key

1 32 13 24 25 1

C2

c_key a_key3 24 2

C3

c_key a_key1 3

5 1

1 1 14 1

Resultsa_key b_key c_key1 1 21 4 21 1 51 4 52 2 32 5 32 2 42 5 43 3 1

14115

Resultsa_key b_key c_key1 1 21 4 2

5

Page 23: Multi-Way Hash Join Effectiveness

23

Generalized Hash Teams (GHT)

• Extends Hash Teams• Does not need the join attributes to be the same• Uses indirect partitioning• Needs an in-memory map to indirectly join relations

Page 24: Multi-Way Hash Join Effectiveness

24

GHT Partition Maps

• Uses join memory• Use a bitmap to approximate mapping to reduce

memory requirements• Needs a bitmap for each partition• Bitmaps introduce mapping errors that cause tuples to

be mapped to multiple partitions (false drops)• False drops add I/O and Processing cost

Page 25: Multi-Way Hash Join Effectiveness

25

GHT ExampleSELECT c.custkey, o.orderkey, l.partkey FROM Customer c, Orders o, Lineitem lWHERE c.custkey = o.custkey AND o.orderkey = l.orderkey;

Customercustkey123

Ordersorderkey custkey1 12 23 34 15 2

Lineitemorderkey partkey1 11 22 32 43 13 84 54 65 4

Page 26: Multi-Way Hash Join Effectiveness

26

GHT Customer Partitions

Customer1 Customer2 Customer3

custkey custkey custkey1 2 3

Hash Function: partition = (custkey - 1 mod 3) + 1

Page 27: Multi-Way Hash Join Effectiveness

27

Orders Partitions and Bitmap

Orders1

orderkey custkey1 14 1

Orders2

orderkey custkey2 25 2

Orders2

orderkey custkey

Orders3

orderkey custkey3 3

Orders3

orderkey custkey

Orders1

orderkey custkey

Ordersorderkey custkey1 12 23 34 15 2

1 12 23 34 15 2

B1

0000

B2

0000

B3

0000

Index = (orderkey +1) mod 4

B1

0010

B1

0110

B2

0001

B2

0011

B3

1000

Hash Function:partition = (custkey - 1 mod 3) + 1

Page 28: Multi-Way Hash Join Effectiveness

28

Orders Partitions and Bitmap

B1 B2 B3

0 0 11 0 01 1 00 1 0

B1

0110

B2

0011

B3

1000

Page 29: Multi-Way Hash Join Effectiveness

29

Lineitem Partitions with False Drops

Lineitem1

orderkey partkey1 11 24 54 65 4

Lineitem2

orderkey partkey1 11 22 32 45 4

Lineitem3

orderkey partkey3 13 8

Lineitem1

orderkey partkey

Lineitem2

orderkey partkey

Lineitem3

orderkey partkey

Lineitemorderkey partkey1 11 22 32 43 13 84 54 65 4

B1

0110

B2

0011

B3

1000

Index = (orderkey +1) mod 4

1 11 22 32 43 13 84 54 65 4

1 11 2

5 4

False Drop

False DropFalse Drop

Page 30: Multi-Way Hash Join Effectiveness

30

Lineitem1

orderkey partkey1 11 24 54 65 4

Joining the Partitions

1 11 24 54 65 4

Customer1

custkey1

Orders1

orderkey custkey1 14 11 14 1

1

Resultscustkey orderkey partkey1 1 11 1 21 4 51 4 62 2 32 2 42 5 43 3 13 3 8

Resultscustkey orderkey partkey

1 1 1

1 1 1

2 1 1

1

5 4 1

4 1

1

6 4 1False Drop

Page 31: Multi-Way Hash Join Effectiveness

31

SHARP

• Limited to star joins– Looks like a star– All tables related to a central table

Fact

key a_key b_key c_key d_key e_key

A

a_key data

C

c_key data

B

b_key data

E

e_key data

D

d_key data

Page 32: Multi-Way Hash Join Effectiveness

32

SHARP Example

Customer Product Saleitemid name id name c_id p_id1 Bob 1 Hammer 1 12 Joe 2 Drill 1 23 Greg 3 Screwdriver 2 34 Susan 4 Scissors 2 6

5 Toolbox 3 16 Knife 3 5

2 54 13 6

SELECT * FROM Customer c, Product p, Saleitem sWHERE c.id = s.c_id AND p.id = s.p_id;

Page 33: Multi-Way Hash Join Effectiveness

33

SHARP Example Partitions

Customerid name1 Bob2 Joe3 Greg4 Susan

Customer1

id name1 Bob3 Greg

Customer1

id name

Customer2

id name2 Joe4 Susan

Customer2

id name

1 Bob2 Joe3 Greg4 Susan

Hash Function: partition = (id - 1 mod 2) + 1

Page 34: Multi-Way Hash Join Effectiveness

34

SHARP Example Partitions

Productid name1 Hammer2 Drill3 Screwdriver4 Scissors5 Toolbox6 Knife

Product1

id name1 Hammer4 Scissors

Product2

id name2 Drill5 Toolbox

Product3

id name3 Screwdriver6 Knife

1 Hammer

Product1

id name

Product2

id name

Product3

id name

2 Drill3 Screwdriver4 Scissors5 Toolbox6 Knife

Hash Function: partition = (id - 1 mod 3) + 1

Page 35: Multi-Way Hash Join Effectiveness

35

SHARP Example Partitions

Saleitemc_id p_id1 11 22 32 63 13 52 54 13 6

Saleitem1,1

c_id p_id1 13 1

Saleitem1,1

c_id p_id

Saleitem1,2

c_id p_id1 23 5

Saleitem1,2

c_id p_id

Saleitem1,3

c_id p_id3 6

Saleitem1,3

c_id p_id

Saleitem2,1

c_id p_id4 1

Saleitem2,1

c_id p_id

Saleitem2,2

c_id p_id2 5

Saleitem2,2

c_id p_id

Saleitem2,3

c_id p_id2 32 6

Saleitem2,3

c_id p_id

1 11 2

2 32 63 13 52 5

4 13 6

c_id mod 2 = 1 c_id mod 2 = 0

p_id mod 3 = 1

p_id mod 3 = 2

p_id mod 3 = 0

Page 36: Multi-Way Hash Join Effectiveness

36

SHARP Partition Combinations

• Customer1, Product1, and Saleitem1,1

• Customer1, Product2, and Saleitem1,2

• Customer1, Product3, and Saleitem1,3

• Customer2, Product2, and Saleitem2,1

• Customer2, Product2, and Saleitem2,2

• Customer2, Product3, and Saleitem2,3

For each partition i of Customer For each partition j of Product probe with partition i,j of Saleitem output matches between Customeri, Productj, and Saleitemi,j

Page 37: Multi-Way Hash Join Effectiveness

37

Resultsc_id c_name p_id p_name

SHARP Join

Saleitem1,1

c_id p_id1 13 1

Product1

id name1 Hammer4 Scissors

Customer1

id name1 Bob3 Greg

1 13 1

1 Hammer 1 Bob3 Greg

1 Hammer

Page 38: Multi-Way Hash Join Effectiveness

38

Resultsc_id c_name p_id p_name1 Bob 1 Hammer3 Greg 1 Hammer1 Bob 2 Drill3 Greg 5 Toolbox3 Greg 6 Knife4 Susan 1 Hammer2 Joe 5 Toolbox2 Joe 3 Screwdriver2 Joe 6 Knife

Resultsc_id c_name p_id p_name1 Bob 1 Hammer3 Greg 1 Hammer

SHARP Join

Saleitem1,2

c_id p_id1 23 5

Product2

id name2 Drill5 Toolbox

Customer1

id name1 Bob3 Greg3 5

2 Drill 1 Bob3 Greg5 Toolbox

1 2

Page 39: Multi-Way Hash Join Effectiveness

39

Multi-Way Join Summary

Algorithm Relevant QueriesHash Teams Any query performing an inner join on identical attributes

in all relations.Generalized Hash Teams

Any query performing an inner join on direct and indirect attributes. Requires extra memory for indirect queries.

SHARP Only star queries.

Page 40: Multi-Way Hash Join Effectiveness

40

Thesis Questions

• The study seeks to answer the following questions:Q1: Does Hash Teams provide an advantage over DHJ?Q2: Does Generalized Hash Teams provide an advantage over DHJ?Q3: Does SHARP provide an advantage over DHJ?Q4: Should these algorithms be implemented in a relational database system in addition to the existing binary join algorithms?

Page 41: Multi-Way Hash Join Effectiveness

41

Multi-Way Join Implementation

• Performance is implementation dependent• Multiple implementations were created

– PostgreSQL http://www.postgresql.org/– Standalone C++– Verified the results in another environment

Page 42: Multi-Way Hash Join Effectiveness

42

Experimental Results

Page 43: Multi-Way Hash Join Effectiveness

43

PostgreSQL Results

• All experiments were performed by comparing the multi-way join against the built-in hash join

• Hybrid Hash Join (HHJ)• Data was based on 10GB TPC-H benchmark data

– Generated using Microsoft’s TPC-H generator– ftp.research.microsoft.com/users/viveknar/tpcdskew

Page 44: Multi-Way Hash Join Effectiveness

44

TPC-H Relations

Relation Tuple Size Number of Tuples Relation Size

Customer 194 Bytes 1.5 Million 284 MBSupplier 184 Bytes 100,000 18 MBPart 173 Bytes 2 Million 323 MBOrders 147 Bytes 15 Million 2097 MBPartSup 182 Bytes 8 Million 1392 MBLineitem 162 Bytes 60 Million 9270 MB

Page 45: Multi-Way Hash Join Effectiveness

45

Hash Teams in PostgreSQL

• Performed 3-way join on the Orders relation using direct partitioning

0 500 1000 1500 2000 2500 30000

50

100

150

200

250

300Time

Hash Teams HHJ

Memory Size (MB)

Tim

e (S

econ

ds)

0 500 1000 1500 2000 2500 30000

4000

8000

12000

16000

20000I/O Bytes

Hash Teams HHJ

Memory Size (MB)

I/Os

(MB)

Page 46: Multi-Way Hash Join Effectiveness

46

Generalized Hash Teams in PostgreSQL

• Indirect partitioning with a join on Customer, Orders, and Lineitem• Tested using multiple mappers

– Bitmap– Exact

Page 47: Multi-Way Hash Join Effectiveness

47

Generalized Hash Teams in PostgreSQL

0 500 1000 1500 2000 2500 3000400

500

600

700

800

900

1000

1100Time

GHT Exact GHT Bitmap HHJ

Memory Size (MB)

Tim

e (S

econ

ds)

0 500 1000 1500 2000 2500 30000

5000

10000

15000

20000

25000

30000 I/O Bytes

GHT Exact GHT Bitmap HHJ

Memory Size (MB)

I/Os (

MB)

Page 48: Multi-Way Hash Join Effectiveness

48

SHARP in PostgreSQL

• Star join using Part, Orders, and Lineitem

0 500 1000 1500 2000 2500 3000400500600700800900

1000110012001300

Time

SHARP HHJ

Memory Size (MB)

Tim

e (S

econ

ds)

0 500 1000 1500 2000 2500 30000

100002000030000400005000060000700008000090000 I/O Bytes

SHARP HHJ

Memory Size (MB)

I/Os (

MB)

Page 49: Multi-Way Hash Join Effectiveness

49

Standalone C++ Results

• Uses same TPC-H data as the PostgreSQL experiments

Page 50: Multi-Way Hash Join Effectiveness

50

Standalone C++ Hash Teams• Performed 3-way join on the Orders relation using direct partitioning

0 1000 2000 3000 4000 50000

102030405060708090

100Time

DHJ Left DHJ Right Hash Teams

Memory Size (MB)

Tim

e (S

econ

ds)

0 1000 2000 3000 4000 50000

2000400060008000

100001200014000160001800020000

I/O Bytes

DHJ Left DHJ Right Hash Teams

Memory Size (MB)

I/Os (

MB)

Page 51: Multi-Way Hash Join Effectiveness

51

Standalone C++ Generalized Hash Teams

• Indirect partitioning with a join on Customer, Orders, and Lineitem• Tested using bitmap mapper• Tested GHT by

– Not counting mapper memory– Counting mapper memory for small memory sizes– Varying the amount of memory available for the mapper

Page 52: Multi-Way Hash Join Effectiveness

52

GHT Map Memory Not Counted

0 1000 2000 3000 4000 50000

20406080

100120140160180

Time

DHJ Left DHJ Right GHT

Memory Size (MB)

Tim

e (s

econ

ds)

0 1000 2000 3000 4000 50000

5000

10000

15000

20000

25000

30000I/O Bytes

DHJ Left DHJ Right GHT

Memory Size (MB)

I/O

s (M

B)

Page 53: Multi-Way Hash Join Effectiveness

53

GHT at Small Memory Sizes

0 50 100 150 200 250 300 350 4000

50100150200250300350400450500

Time

DHJ Left DHJ Right GHT

Memory Size (MB)

Tim

e (s

econ

ds)

0 50 100 150 200 250 300 350 4000

10000

20000

30000

40000

50000

60000

70000I/O Bytes

DHJ Left DHJ Right GHT

Memory Size (MB)

I/O

s (M

B)

Page 54: Multi-Way Hash Join Effectiveness

54

GHT and Bitmap Size

0.1 0.25 0.5 1 1.5 2 40

50000000

100000000

150000000

200000000

250000000

300000000

350000000

400000000

False Drops

Bitmap Size Multiplyer

False

Dro

ps (M

illio

ns)

0.1 0.25 0.5 1 1.5 2 40

50

100

150

200

250

300

350

400

Time

Bitmap Size Multiplyer

Tim

e (s

econ

ds)

Page 55: Multi-Way Hash Join Effectiveness

55

Standalone C++ SHARP• Star join using Part, Orders, and Lineitem

0 500 1000 1500 2000 25000

50

100

150

200

250

300

350Time

SHARP DHJ

Memory Size (MB)

Tim

e (s

econ

ds)

0 500 1000 1500 2000 25000

5000

10000

15000

20000

25000

30000

35000

40000I/O Bytes

SHARP DHJ

Memory Size (MB)

I/O

s (M

B)

Page 56: Multi-Way Hash Join Effectiveness

56

Conclusions

Page 57: Multi-Way Hash Join Effectiveness

57

Thesis Questions

Q1: Does Hash Teams provide an advantage over DHJ?Q2: Does Generalized Hash Teams provide an advantage over DHJ?Q3: Does SHARP provide an advantage over DHJ?Q4: Should these algorithms be implemented in a relational database system in addition to the existing binary join algorithms?

Page 58: Multi-Way Hash Join Effectiveness

58

Does Hash Teams provide an advantage over DHJ?

• Yes– Performs fewer I/Os than DHJ– Evaluates Queries Faster– Uses memory more efficiently– Performs fewer partitioning steps

• Queries that can use Hash Teams are very limited in practice.• In many cases a traditional sort-merge join would be more efficient• Hash Teams is much more complex to implement and maintain

Page 59: Multi-Way Hash Join Effectiveness

59

Does Generalized Hash Teams provide an advantage over DHJ?

• Sometimes– When GHT performs fewer I/Os

• Performance is bad when there are a lot of false drops• Much more complex than DHJ or Hash Teams• Mapper can hurt performance

Page 60: Multi-Way Hash Join Effectiveness

60

Does SHARP provide an advantage over DHJ?

• Yes– Performs fewer I/Os– Evaluates queries quicker– Uses memory more efficiently– Fewer partitioning steps

• Limited to star queries• More complex to implement and maintain

Page 61: Multi-Way Hash Join Effectiveness

61

Should these algorithms be implemented in a relational database system?

• Hash teams should not be implemented.– Too limited in use– Microsoft removed support for Hash Teams from SQL Server 2003

• Generalized Hash Teams should not be implemented.– GHT can be much slower than DHJ– Mapper makes GHT much more complex to implement and maintain

• SHARP should be implemented.– Shows a significant performance advantage– Star queries are commonly used in data warehousing

Page 62: Multi-Way Hash Join Effectiveness

62

Future Work

• Experiments with the algorithms on different data sets• Experiments with larger numbers of relations• Extend Hash Teams and GHT implementations to

support GROUP BY to see if it makes them more useful

Page 63: Multi-Way Hash Join Effectiveness

63

Thank You

Page 64: Multi-Way Hash Join Effectiveness

64

Appendix

Page 65: Multi-Way Hash Join Effectiveness

65

TPC-H Relations http://www.tpc.org/tpch/