To Share or Not to Share?
Ryan Johnson
Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli,
Anastasia Ailamaki, Babak Falsafi
PARALLEL DATA LABORATORYCarnegie Mellon University
**HP LABS
Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 2
Motivation For Work Sharing
scan
join
aggregate
scan
output
StudentDept
Query: What is the highest undergraduate GPA?
aggregate
output
scan
Student
Query: What is the average GPA in the ECE dept.?
Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 3
Motivation For Work Sharing• Many queries in
system• Similar requests• Redundant work
• Work Sharing• Detect
redundant work• Compute results
once and share
• Big win for I/O, uniprocessors
scan
join
aggregate
scan
output
StudentDept
aggregate
output
scan
Student
2x speedup for TPC-H queries [hariz05]
Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 4
0.0
0.5
1.0
1.5
2.0
0 15 30 45Shared Queries
Sp
eed
up
du
e to
WS
1 CPU
7x
Work Sharing on Modern Hardware
8 CPU
Memory
L1
L2
core
L1
L2
core
L2
Memory
Work sharing can hurt performance!
Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 5
Contributions
• Observation• Work sharing can hurt performance on parallel
hardware
• Analysis• Develop intuitive analytical model of work sharing• Identify trade-off between total work, critical path
• Application• Model-based policy outperforms static ones by
up to 6x
Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 6
Outline
• Introduction• Part I: Intuition and Model• Part II: Analysis and Experiments
Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 7
Challenges of Exploiting Work Sharing
• Independent execution only?• Load reduction from work sharing can be useful
• Work sharing only?• Indiscriminate application can hurt performance
• To share or not to share? • System and workload dependent • Adapt decisions at runtime
Must understand work sharing to exploit it fully
Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 8
Work Sharing vs. Parallelism
Query 2
Independent Execution
Query 1
Query 2 response time
Query 1 response time
Scan
Join
Aggregate
P = 4.33
Critical Paths
Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 9
Work Sharing vs. Parallelism
Query 2
Query 1
Query 2 response time
Query 1 response time
Shared Execution
Penalty
Scan
Join
Aggregate
P = 2.75
P = 4.33
Critical path now longer
Total work and critical path both important
Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 10
Understanding Work Sharing
• Performance depends on two factors:
• Work sharing presents a trade-off• Reduces total work• Potentially lengthens critical path
Balance both factors or performance suffers
thCriticalPaTotalWorkfThroughput
1,
1
Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 11
Basis for a Model
• “Closed” system• Consistent high load• Throughput computing• Assumed in most benchmarks• Fixed number of clients
• Little’s Law governs throughput• Higher response time = lower throughput• Total work not a direct factor!
Load reduction secondary to response time
Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 12
Predicting Response Time
• Case 1: Compute-bound
• Case 2: Critical path-bound
• Larger bottleneck determines response time
stage pipeslowest at Delay max pTime
Processors Available
on UtilizatiRequested
n
uTime
Model provides u and pmax
Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 13
An Analytical Model of Work Sharing
Sharing helpful when Xshared > Xalone
)(
1,
)(min),(
max mpmu
nmnmX
Throughput for m queries and n processors
Potentially worsened by work sharing
][][
TE
mXE
Improved by work sharing
U = requested utilizationPmax = longest pipe stage
Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 14
Outline
• Introduction• Part I: Intuition and Model• Part II: Analysis and Experiments
Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 15
Experimental Setup• Hardware
• Sun T2000 “Niagara” with 16 GB RAM• 8 cores (32 threads) • Solaris processor sets vary effective CPU count
• Cordoba• Staged DBMS• Naturally exposes work sharing• Flexible work sharing policies
• 1GB TPCH dataset• Fixed Client and CPU counts per run
Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 16
Predicted vs. Measured Performance
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 15 30 45Shared Queries
Sp
eed
up
du
e to
WS 1 CPU
1 CPU model
2 CPU2 CPU model
8 CPU8 CPU model
32 CPU32 CPU model
Model Validation: TPCH Q1
Avg/max error: 5.7% / 22%
Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 17
Predicted vs. Measured Performance
0
10
20
30
0 15 30 45Clients
Sp
eed
up
du
e to
WS 1 CPU
1 CPU model
2 CPU2 CPU model
8 CPU8 CPU model
32 CPU32 CPU model
Model Validation: TPCH Q4
Behavior varies with both system and workload
Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 18
Exploring WS vs. Parallelism• Work sharing splits query
into three parts• Independent work
– Per-query, parallel– Total work
• Serial work– Per-query, serial– Critical path
• Shared work– Computed once– “Free” after first query
Serial - 4% Independent - 37%
Shared - 59%
Example:
Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 19
0
0.5
1
1.5
2
2.5
0 8 16 24 32Shared Queries
Ben
efit
fro
m W
ork
Sh
arin
g
4
8
16
32
Exploring WS vs. Parallelism
Behavior matches previously published results
Potential Speedup
CPUs
Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 20
0
0.5
1
1.5
2
2.5
0 8 16 24 32Shared Queries
Ben
efit
fro
m W
ork
Sh
arin
g
4
8
Exploring WS vs. Parallelism
Potential Speedup
CPUs
Saturated
Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 21
0
0.5
1
1.5
2
2.5
0 8 16 24 32Shared Queries
Ben
efit
fro
m W
ork
Sh
arin
g
4
8
16
32
Exploring WS vs. Parallelism
Potential Speedup
CPUs
Saturated
Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 22
0
0.5
1
1.5
2
2.5
0 8 16 24 32Shared Queries
Ben
efit
fro
m W
ork
Sh
arin
g
4
8
16
32
Exploring WS vs. Parallelism
More processors shift bottleneck to critical path
Saturated
Potential Speedup
CPUs
Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 23
Performance Impact of Serial Work
Critical path quickly becomes major bottleneck
Impact of Serial Work (32 CPU)
0
0.5
1
1.5
2
2.5
0 10 20 30 40Shared Queries
Ben
efit
fro
m W
ork
Sh
arin
g
0%
1%
2%7%
Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 24
Model-guided Work Sharing
• Integrate predictive model into Cordoba• Predict benefit of work sharing for each new query
• Consider multiple groups of queries at once• Shorter critical path, increased parallelism
• Experimental setup• Profile run with 2 clients, 2 CPUs• Extract model parameters with profiling tools• 20 clients submit mix of TPCH Q1 and Q4
Compare against always-, never-share policies
Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 25
Comparison of Work Sharing Strategies
Model-based policy balances critical path and load
2 CPU
0
50
100
150
200
All Q1 50/50 All Q4
Query Ratio
Qu
eri
es/m
in
always sharemodel guidednever share
32 CPU
0
50
100
150
200
250
Qu
eri
es/m
inAll Q1 50/50 All Q4
Query Ratio
Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 26
Related Work
• Many existing work sharing schemes• Identification occurs at different stages in the
query’s lifetime• All allow pipelined query execution
Materialized Views
[rouss82]
Multiple Query Optimization
[roy00]
Staged DBMS[hariz05]
Synchronized Scanning[lang07]
Schema design
Buffer Pool Access
Query compilation
Query execution
Early Late
Model describes all types of work sharing
Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 27
Conclusions
• Work sharing can hurt performance• Highly parallel, memory resident machines
• Intuitive analytical model captures behavior• Trade-off between load reduction and critical path
• Model-guided work sharing highly effective• Outperforms static policies by up to 6x
http://www.cs.cmu.edu/~StagedDB/
Ryan Johnson © Apr 19, 2023 http://www.pdl.cmu.edu/ 28
References
•[hariz05] S. Harizopoulos, V. Shkapenyuk, and A. Ailamaki. “QPipe: A Simultaneously Pipelined Relational Query Engine.” In Proc. SIGMOD, 2005.
•[lang07] C. Lang, B. Bhattacharjee, T. Malkemus, S. Padmanabhan, and K. Wong. “Increasing Buffer-Locality for Multiple Relational Table Scans through Grouping and Throttling.” In Proc. ICDE, 2007.•[rouss82] N. Roussopoulos. “View Indexing in Relational databases.” In ACM TODS, 7(2):258-290, 1982.•[roy00] P. Roy, S. Seshadri, S. Sudarshan, and S. Bhobe. “Efficient and Extensible Algorithms for Multi Query Optimization.” In Proc. SIGMOD, 2000.
http://www.cs.cmu.edu/~StagedDB/