Download ppt - To Share or Not to Share?

Transcript
Page 1: To Share or Not to Share?

To Share or Not to Share?

Ryan Johnson

Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli,

Anastasia Ailamaki, Babak Falsafi

PARALLEL DATA LABORATORYCarnegie Mellon University

**HP LABS

Page 2: To Share or Not to Share?

Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 2

Motivation For Work Sharing

scan

join

aggregate

scan

output

StudentDept

Query: What is the highest undergraduate GPA?

aggregate

output

scan

Student

Query: What is the average GPA in the ECE dept.?

Page 3: To Share or Not to Share?

Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 3

Motivation For Work Sharing• Many queries in

system• Similar requests• Redundant work

• Work Sharing• Detect

redundant work• Compute results

once and share

• Big win for I/O, uniprocessors

scan

join

aggregate

scan

output

StudentDept

aggregate

output

scan

Student

2x speedup for TPC-H queries [hariz05]

Page 4: To Share or Not to Share?

Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 4

0.0

0.5

1.0

1.5

2.0

0 15 30 45Shared Queries

Sp

eed

up

du

e to

WS

1 CPU

7x

Work Sharing on Modern Hardware

8 CPU

Memory

L1

L2

core

L1

L2

core

L2

Memory

Work sharing can hurt performance!

Page 5: To Share or Not to Share?

Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 5

Contributions

• Observation• Work sharing can hurt performance on parallel

hardware

• Analysis• Develop intuitive analytical model of work sharing• Identify trade-off between total work, critical path

• Application• Model-based policy outperforms static ones by

up to 6x

Page 6: To Share or Not to Share?

Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 6

Outline

• Introduction• Part I: Intuition and Model• Part II: Analysis and Experiments

Page 7: To Share or Not to Share?

Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 7

Challenges of Exploiting Work Sharing

• Independent execution only?• Load reduction from work sharing can be useful

• Work sharing only?• Indiscriminate application can hurt performance

• To share or not to share? • System and workload dependent • Adapt decisions at runtime

Must understand work sharing to exploit it fully

Page 8: To Share or Not to Share?

Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 8

Work Sharing vs. Parallelism

Query 2

Independent Execution

Query 1

Query 2 response time

Query 1 response time

Scan

Join

Aggregate

P = 4.33

Critical Paths

Page 9: To Share or Not to Share?

Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 9

Work Sharing vs. Parallelism

Query 2

Query 1

Query 2 response time

Query 1 response time

Shared Execution

Penalty

Scan

Join

Aggregate

P = 2.75

P = 4.33

Critical path now longer

Total work and critical path both important

Page 10: To Share or Not to Share?

Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 10

Understanding Work Sharing

• Performance depends on two factors:

• Work sharing presents a trade-off• Reduces total work• Potentially lengthens critical path

Balance both factors or performance suffers

thCriticalPaTotalWorkfThroughput

1,

1

Page 11: To Share or Not to Share?

Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 11

Basis for a Model

• “Closed” system• Consistent high load• Throughput computing• Assumed in most benchmarks• Fixed number of clients

• Little’s Law governs throughput• Higher response time = lower throughput• Total work not a direct factor!

Load reduction secondary to response time

Page 12: To Share or Not to Share?

Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 12

Predicting Response Time

• Case 1: Compute-bound

• Case 2: Critical path-bound

• Larger bottleneck determines response time

stage pipeslowest at Delay max pTime

Processors Available

on UtilizatiRequested

n

uTime

Model provides u and pmax

Page 13: To Share or Not to Share?

Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 13

An Analytical Model of Work Sharing

Sharing helpful when Xshared > Xalone

)(

1,

)(min),(

max mpmu

nmnmX

Throughput for m queries and n processors

Potentially worsened by work sharing

][][

TE

mXE

Improved by work sharing

U = requested utilizationPmax = longest pipe stage

Page 14: To Share or Not to Share?

Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 14

Outline

• Introduction• Part I: Intuition and Model• Part II: Analysis and Experiments

Page 15: To Share or Not to Share?

Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 15

Experimental Setup• Hardware

• Sun T2000 “Niagara” with 16 GB RAM• 8 cores (32 threads) • Solaris processor sets vary effective CPU count

• Cordoba• Staged DBMS• Naturally exposes work sharing• Flexible work sharing policies

• 1GB TPCH dataset• Fixed Client and CPU counts per run

Page 16: To Share or Not to Share?

Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 16

Predicted vs. Measured Performance

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 15 30 45Shared Queries

Sp

eed

up

du

e to

WS 1 CPU

1 CPU model

2 CPU2 CPU model

8 CPU8 CPU model

32 CPU32 CPU model

Model Validation: TPCH Q1

Avg/max error: 5.7% / 22%

Page 17: To Share or Not to Share?

Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 17

Predicted vs. Measured Performance

0

10

20

30

0 15 30 45Clients

Sp

eed

up

du

e to

WS 1 CPU

1 CPU model

2 CPU2 CPU model

8 CPU8 CPU model

32 CPU32 CPU model

Model Validation: TPCH Q4

Behavior varies with both system and workload

Page 18: To Share or Not to Share?

Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 18

Exploring WS vs. Parallelism• Work sharing splits query

into three parts• Independent work

– Per-query, parallel– Total work

• Serial work– Per-query, serial– Critical path

• Shared work– Computed once– “Free” after first query

Serial - 4% Independent - 37%

Shared - 59%

Example:

Page 19: To Share or Not to Share?

Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 19

0

0.5

1

1.5

2

2.5

0 8 16 24 32Shared Queries

Ben

efit

fro

m W

ork

Sh

arin

g

4

8

16

32

Exploring WS vs. Parallelism

Behavior matches previously published results

Potential Speedup

CPUs

Page 20: To Share or Not to Share?

Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 20

0

0.5

1

1.5

2

2.5

0 8 16 24 32Shared Queries

Ben

efit

fro

m W

ork

Sh

arin

g

4

8

Exploring WS vs. Parallelism

Potential Speedup

CPUs

Saturated

Page 21: To Share or Not to Share?

Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 21

0

0.5

1

1.5

2

2.5

0 8 16 24 32Shared Queries

Ben

efit

fro

m W

ork

Sh

arin

g

4

8

16

32

Exploring WS vs. Parallelism

Potential Speedup

CPUs

Saturated

Page 22: To Share or Not to Share?

Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 22

0

0.5

1

1.5

2

2.5

0 8 16 24 32Shared Queries

Ben

efit

fro

m W

ork

Sh

arin

g

4

8

16

32

Exploring WS vs. Parallelism

More processors shift bottleneck to critical path

Saturated

Potential Speedup

CPUs

Page 23: To Share or Not to Share?

Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 23

Performance Impact of Serial Work

Critical path quickly becomes major bottleneck

Impact of Serial Work (32 CPU)

0

0.5

1

1.5

2

2.5

0 10 20 30 40Shared Queries

Ben

efit

fro

m W

ork

Sh

arin

g

0%

1%

2%7%

Page 24: To Share or Not to Share?

Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 24

Model-guided Work Sharing

• Integrate predictive model into Cordoba• Predict benefit of work sharing for each new query

• Consider multiple groups of queries at once• Shorter critical path, increased parallelism

• Experimental setup• Profile run with 2 clients, 2 CPUs• Extract model parameters with profiling tools• 20 clients submit mix of TPCH Q1 and Q4

Compare against always-, never-share policies

Page 25: To Share or Not to Share?

Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 25

Comparison of Work Sharing Strategies

Model-based policy balances critical path and load

2 CPU

0

50

100

150

200

All Q1 50/50 All Q4

Query Ratio

Qu

eri

es/m

in

always sharemodel guidednever share

32 CPU

0

50

100

150

200

250

Qu

eri

es/m

inAll Q1 50/50 All Q4

Query Ratio

Page 26: To Share or Not to Share?

Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 26

Related Work

• Many existing work sharing schemes• Identification occurs at different stages in the

query’s lifetime• All allow pipelined query execution

Materialized Views

[rouss82]

Multiple Query Optimization

[roy00]

Staged DBMS[hariz05]

Synchronized Scanning[lang07]

Schema design

Buffer Pool Access

Query compilation

Query execution

Early Late

Model describes all types of work sharing

Page 27: To Share or Not to Share?

Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 27

Conclusions

• Work sharing can hurt performance• Highly parallel, memory resident machines

• Intuitive analytical model captures behavior• Trade-off between load reduction and critical path

• Model-guided work sharing highly effective• Outperforms static policies by up to 6x

http://www.cs.cmu.edu/~StagedDB/

Page 28: To Share or Not to Share?

Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 28

References

•[hariz05] S. Harizopoulos, V. Shkapenyuk, and A. Ailamaki. “QPipe: A Simultaneously Pipelined Relational Query Engine.” In Proc. SIGMOD, 2005.

•[lang07] C. Lang, B. Bhattacharjee, T. Malkemus, S. Padmanabhan, and K. Wong. “Increasing Buffer-Locality for Multiple Relational Table Scans through Grouping and Throttling.” In Proc. ICDE, 2007.•[rouss82] N. Roussopoulos. “View Indexing in Relational databases.” In ACM TODS, 7(2):258-290, 1982.•[roy00] P. Roy, S. Seshadri, S. Sudarshan, and S. Bhobe. “Efficient and Extensible Algorithms for Multi Query Optimization.” In Proc. SIGMOD, 2000.

http://www.cs.cmu.edu/~StagedDB/


Recommended