Download ppt - To Share or Not to Share?

To Share or Not to Share?

Ryan Johnson

Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli,

Anastasia Ailamaki, Babak Falsafi

PARALLEL DATA LABORATORYCarnegie Mellon University

**HP LABS

Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 2

Motivation For Work Sharing

scan

join

aggregate

scan

output

StudentDept

Query: What is the highest undergraduate GPA?

aggregate

output

scan

Student

Query: What is the average GPA in the ECE dept.?


Motivation For Work Sharing• Many queries in

system• Similar requests• Redundant work

• Work Sharing• Detect

redundant work• Compute results

once and share

• Big win for I/O, uniprocessors

scan

join

aggregate

scan

output

StudentDept

aggregate

output

scan

Student

2x speedup for TPC-H queries [hariz05]


0.0

0.5

1.0

1.5

2.0

0 15 30 45Shared Queries

Sp

eed

up

du

e to

WS

1 CPU

7x

Work Sharing on Modern Hardware

8 CPU

Memory

L1

L2

core

L1

L2

core

L2

Memory

Work sharing can hurt performance!


Contributions

• Observation• Work sharing can hurt performance on parallel

hardware

• Analysis• Develop intuitive analytical model of work sharing• Identify trade-off between total work, critical path

• Application• Model-based policy outperforms static ones by

up to 6x


Outline

• Introduction• Part I: Intuition and Model• Part II: Analysis and Experiments


Challenges of Exploiting Work Sharing

• Independent execution only?• Load reduction from work sharing can be useful

• Work sharing only?• Indiscriminate application can hurt performance

• To share or not to share? • System and workload dependent • Adapt decisions at runtime

Must understand work sharing to exploit it fully


Work Sharing vs. Parallelism

Query 2

Independent Execution

Query 1

Query 2 response time


Scan

Join

Aggregate

P = 4.33

Critical Paths


Work Sharing vs. Parallelism

Query 2

Query 1



Shared Execution

Penalty

Scan

Join

Aggregate

P = 2.75

P = 4.33

Critical path now longer

Total work and critical path both important


Understanding Work Sharing

• Performance depends on two factors:

• Work sharing presents a trade-off• Reduces total work• Potentially lengthens critical path

Balance both factors or performance suffers

thCriticalPaTotalWorkfThroughput

1,

1


Basis for a Model

• “Closed” system• Consistent high load• Throughput computing• Assumed in most benchmarks• Fixed number of clients

• Little’s Law governs throughput• Higher response time = lower throughput• Total work not a direct factor!

Load reduction secondary to response time


Predicting Response Time

• Case 1: Compute-bound

• Case 2: Critical path-bound

• Larger bottleneck determines response time

stage pipeslowest at Delay max pTime

Processors Available

on UtilizatiRequested

n

uTime

Model provides u and pmax


An Analytical Model of Work Sharing

Sharing helpful when Xshared > Xalone

)(

1,

)(min),(

max mpmu

nmnmX

Throughput for m queries and n processors

Potentially worsened by work sharing

][][

TE

mXE

Improved by work sharing

U = requested utilizationPmax = longest pipe stage


Outline

• Introduction• Part I: Intuition and Model• Part II: Analysis and Experiments


Experimental Setup• Hardware

• Sun T2000 “Niagara” with 16 GB RAM• 8 cores (32 threads) • Solaris processor sets vary effective CPU count

• Cordoba• Staged DBMS• Naturally exposes work sharing• Flexible work sharing policies

• 1GB TPCH dataset• Fixed Client and CPU counts per run


Predicted vs. Measured Performance

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 15 30 45Shared Queries

Sp

eed

up

du

e to

WS 1 CPU

1 CPU model

2 CPU2 CPU model

8 CPU8 CPU model

32 CPU32 CPU model

Model Validation: TPCH Q1

Avg/max error: 5.7% / 22%


Predicted vs. Measured Performance

0

10

20

30

0 15 30 45Clients

Sp

eed

up

du

e to

WS 1 CPU

1 CPU model

2 CPU2 CPU model

8 CPU8 CPU model

32 CPU32 CPU model

Model Validation: TPCH Q4

Behavior varies with both system and workload


Exploring WS vs. Parallelism• Work sharing splits query

into three parts• Independent work

– Per-query, parallel– Total work

• Serial work– Per-query, serial– Critical path

• Shared work– Computed once– “Free” after first query

Serial - 4% Independent - 37%

Shared - 59%

Example:


0

0.5

1

1.5

2

2.5

0 8 16 24 32Shared Queries

Ben

efit

fro

m W

ork

Sh

arin

g

4

8

16

32

Exploring WS vs. Parallelism

Behavior matches previously published results

Potential Speedup

CPUs


0

0.5

1

1.5

2

2.5


Ben

efit

fro

m W

ork

Sh

arin

g

4

8


Potential Speedup

CPUs

Saturated


0

0.5

1

1.5

2

2.5


Ben

efit

fro

m W

ork

Sh

arin

g

4

8

16

32


Potential Speedup

CPUs

Saturated


0

0.5

1

1.5

2

2.5


Ben

efit

fro

m W

ork

Sh

arin

g

4

8

16

32


More processors shift bottleneck to critical path

Saturated

Potential Speedup

CPUs


Performance Impact of Serial Work

Critical path quickly becomes major bottleneck

Impact of Serial Work (32 CPU)

0

0.5

1

1.5

2

2.5


Ben

efit

fro

m W

ork

Sh

arin

g

0%

1%

2%7%


Model-guided Work Sharing

• Integrate predictive model into Cordoba• Predict benefit of work sharing for each new query

• Consider multiple groups of queries at once• Shorter critical path, increased parallelism

• Experimental setup• Profile run with 2 clients, 2 CPUs• Extract model parameters with profiling tools• 20 clients submit mix of TPCH Q1 and Q4

Compare against always-, never-share policies


Comparison of Work Sharing Strategies

Model-based policy balances critical path and load

2 CPU

0

50

100

150

200

All Q1 50/50 All Q4

Query Ratio

Qu

eri

es/m

in

always sharemodel guidednever share

32 CPU

0

50

100

150

200

250

Qu

eri

es/m

inAll Q1 50/50 All Q4

Query Ratio


Related Work

• Many existing work sharing schemes• Identification occurs at different stages in the

query’s lifetime• All allow pipelined query execution

Materialized Views

[rouss82]

Multiple Query Optimization

[roy00]

Staged DBMS[hariz05]

Synchronized Scanning[lang07]

Schema design

Buffer Pool Access

Query compilation

Query execution

Early Late

Model describes all types of work sharing


Conclusions

• Work sharing can hurt performance• Highly parallel, memory resident machines

• Intuitive analytical model captures behavior• Trade-off between load reduction and critical path

• Model-guided work sharing highly effective• Outperforms static policies by up to 6x

http://www.cs.cmu.edu/~StagedDB/


References

•[hariz05] S. Harizopoulos, V. Shkapenyuk, and A. Ailamaki. “QPipe: A Simultaneously Pipelined Relational Query Engine.” In Proc. SIGMOD, 2005.

•[lang07] C. Lang, B. Bhattacharjee, T. Malkemus, S. Padmanabhan, and K. Wong. “Increasing Buffer-Locality for Multiple Relational Table Scans through Grouping and Throttling.” In Proc. ICDE, 2007.•[rouss82] N. Roussopoulos. “View Indexing in Relational databases.” In ACM TODS, 7(2):258-290, 1982.•[roy00] P. Roy, S. Seshadri, S. Sudarshan, and S. Bhobe. “Efficient and Extensible Algorithms for Multi Query Optimization.” In Proc. SIGMOD, 2000.

http://www.cs.cmu.edu/~StagedDB/