View
6
Download
0
Category
Preview:
Citation preview
1
Optimal Join Algorithms meet Top-๐SIGMOD 2020 tutorial
Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald
Northeastern University, Boston
Slides: https://northeastern-datalab.github.io/topk-join-tutorial/DOI: https://doi.org/10.1145/3318464.3383132Data Lab: https://db.khoury.northeastern.edu
Ranked results
Time
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 International License.See https://creativecommons.org/licenses/by-nc-sa/4.0/ for details
Part 2 : Optimal Join Algorithms
2
Outline tutorial
โข Part 1: Top-๐ (Wolfgang): ~20minโข Part 2: Optimal Join Algorithms (Mirek): ~30min
โ Lower Bound and the Yannakakis Algorithm
โ Problems Caused by Cycles
โ Tree Decompositions
โ Summary and Further Reading
โข Part 3: Ranked enumeration over joins (Nikolaos): ~40min
3
Basic Terminology and Assumptions
โข Terminology
- Full conjunctive query (CQ)
โข Natural join of ๐ relations with O(๐) tuples each
โข E.g.: ๐ ๐ด1, ๐ด2, ๐ด3, ๐ด4 = ๐ 1 ๐ด1, ๐ด2 โ ๐ 2 ๐ด1, ๐ด2, ๐ด3 โ ๐ 3 ๐ด2 โ ๐ 4 ๐ด1, ๐ด2, ๐ด4โข Any selections comparing attributes to constants, e.g., ๐ด4 < 1
- Query size: O(๐)
- Output cardinality: ๐
โข Assumptions
- No pre-computed data structures such as indexes, sorted representation, materialized views
4
Complexity Notation
โข Standard O and ฮฉ notation for time and memory complexity in the RAM model of computation
โข Common practice: focus on data complexity
- We care about scalability in data size
โข Treat query size ๐ as a constant
- E.g., O ๐ ๐ โ ๐๐(๐) + log๐ ๐(๐) โ ๐ simplifies to O ๐๐(๐) + log๐ ๐(๐) โ ๐
5
Complexity Notation
โข Standard O and ฮฉ notation for time and memory complexity in the RAM model of computation
โข Common practice: focus on data complexity
- We care about scalability in data size
โข Treat query size ๐ as a constant
- E.g., O ๐ ๐ โ ๐๐(๐) + log๐ ๐(๐) โ ๐ simplifies to O ๐๐(๐) + log๐ ๐(๐) โ ๐
โข We mostly use เทฉO-notation (soft-O) data complexity
- Abstracts away polylog factors in input size that clutter formulas
- E.g., O ๐๐(๐) + log๐ ๐(๐) โ ๐ further simplifies to เทฉO ๐๐(๐) + ๐
6
Outline tutorial
โข Part 1: Top-๐ (Wolfgang): ~20minโข Part 2: Optimal Join Algorithms (Mirek): ~30min
โ Lower Bound and the Yannakakis Algorithm
โ Problems Caused by Cycles
โ Tree Decompositions
โ Summary and Further Reading
โข Part 3: Ranked enumeration over joins (Nikolaos): ~40min
7
Lower Bound for Any Query
โข Need to read entire input at least once: ฮฉ(๐๐)
- ฮฉ(๐) data complexity
โข Need to output every result, each of size ๐: ฮฉ(๐๐)
- ฮฉ(๐) data complexity
โข Together: ฮฉ(๐ + ๐) time complexity to compute any CQ
โข Amazingly, the Yannakakis algorithm essentially matches the lower bound for acyclic CQs
- Time complexity เทฉO ๐ + ๐
8
Yannakakis Algorithm
โข Given: acyclic conjunctive query Q as a rooted join tree
โข Step 1: semi-join reduction (two sweeps)
- Semi-join reduction sweep from the leaves to root
- Semi-join reduction sweep from root to the leaves
โข Step 2: use the join tree as the query plan
- Compute the joins bottom up, with early projections
[Mihalis Yannakakis. Algorithms for acyclic database schemes. VLDBโ81] https://dl.acm.org/doi/10.5555/1286831.1286840
9
Database TheoryYannakakis Algorithm โ Example ๐จ๐ ๐จ๐
1 201 104 60
๐ 1
๐จ๐ ๐จ๐ ๐จ๐
1 10 1001 20 1003 10 3001 40 3002 30 200
๐ 2
๐จ๐
102030
๐ 3 ๐จ๐ ๐จ๐ ๐จ๐
1 10 10001 20 10001 20 20002 20 2000
๐ 4
๐ด2 ๐ด1, ๐ด2
๐ด1, ๐ด2
๐ = ๐ 1 ๐ด1, ๐ด2 โ ๐ 2 ๐ด1, ๐ด2, ๐ด3 โ ๐ 3 ๐ด2 โ ๐ 4 ๐ด1, ๐ด2, ๐ด4
10
Database TheoryYannakakis Algorithm โ Example
1. Bottom-up traversal (semi-joins)
๐จ๐ ๐จ๐
1 201 104 60
๐ 1
๐จ๐ ๐จ๐ ๐จ๐
1 10 1001 20 1003 10 3001 40 3002 30 200
๐ 2
๐จ๐
102030
๐ 3
๐ด2
๐ด1, ๐ด2
๐ = ๐ 1 ๐ด1, ๐ด2 โ ๐ 2 ๐ด1, ๐ด2, ๐ด3 โ ๐ 3 ๐ด2 โ ๐ 4 ๐ด1, ๐ด2, ๐ด4
๐น๐ โ ๐น๐
๐จ๐ ๐จ๐ ๐จ๐
1 10 10001 20 10001 20 20002 20 2000
๐ 4
๐ด1, ๐ด2
11
Database TheoryYannakakis Algorithm โ Example
1. Bottom-up traversal (semi-joins)
๐จ๐ ๐จ๐
1 201 104 60
๐ 1
๐จ๐ ๐จ๐ ๐จ๐
1 10 1001 20 1003 10 3001 40 3002 30 200
๐ 2
๐จ๐
102030
๐ 3 ๐จ๐ ๐จ๐ ๐จ๐
1 10 10001 20 10001 20 20002 20 2000
๐ 4
๐ด2
๐ด1, ๐ด2
๐ = ๐ 1 ๐ด1, ๐ด2 โ ๐ 2 ๐ด1, ๐ด2, ๐ด3 โ ๐ 3 ๐ด2 โ ๐ 4 ๐ด1, ๐ด2, ๐ด4
๐น๐ โ ๐น๐
๐จ๐ ๐จ๐
1 101 202 20
12
Database TheoryYannakakis Algorithm โ Example
1. Bottom-up traversal (semi-joins)
๐จ๐ ๐จ๐
1 201 104 60
๐ 1
๐จ๐ ๐จ๐ ๐จ๐
1 10 1001 20 1003 10 3001 40 3002 30 200
๐ 2
๐จ๐
102030
๐ 3 ๐จ๐ ๐จ๐ ๐จ๐
1 10 10001 20 10001 20 20002 20 2000
๐ 4
๐ด2
๐ด1, ๐ด2
๐ = ๐ 1 ๐ด1, ๐ด2 โ ๐ 2 ๐ด1, ๐ด2, ๐ด3 โ ๐ 3 ๐ด2 โ ๐ 4 ๐ด1, ๐ด2, ๐ด4
๐น๐ โ ๐น๐
๐จ๐ ๐จ๐
1 101 202 20
13
Database TheoryYannakakis Algorithm โ Example
1. Bottom-up traversal (semi-joins)
๐จ๐ ๐จ๐
1 201 104 60
๐ 1
๐จ๐ ๐จ๐ ๐จ๐
1 10 1001 20 1003 10 3001 40 3002 30 200
๐ 2
๐จ๐
102030
๐ 3 ๐จ๐ ๐จ๐ ๐จ๐
1 10 10001 20 10001 20 20002 20 2000
๐ 4
๐ด1, ๐ด2
๐ = ๐ 1 ๐ด1, ๐ด2 โ ๐ 2 ๐ด1, ๐ด2, ๐ด3 โ ๐ 3 ๐ด2 โ ๐ 4 ๐ด1, ๐ด2, ๐ด4
๐น๐ โ ๐น๐
๐จ๐ ๐จ๐
1 101 202 20
๐น๐ โ ๐น๐
๐จ๐
102030
14
Database TheoryYannakakis Algorithm โ Example
1. Bottom-up traversal (semi-joins)
๐จ๐ ๐จ๐
1 201 104 60
๐ 1
๐จ๐ ๐จ๐ ๐จ๐
1 10 1001 20 1003 10 3001 40 3002 30 200
๐ 2
๐จ๐
102030
๐ 3 ๐จ๐ ๐จ๐ ๐จ๐
1 10 10001 20 10001 20 20002 20 2000
๐ 4
๐ = ๐ 1 ๐ด1, ๐ด2 โ ๐ 2 ๐ด1, ๐ด2, ๐ด3 โ ๐ 3 ๐ด2 โ ๐ 4 ๐ด1, ๐ด2, ๐ด4
๐น๐ โ ๐น๐
๐จ๐ ๐จ๐
1 101 202 20
๐น๐ โ ๐น๐
๐จ๐
102030
๐น๐ โ ๐น๐
๐จ๐ ๐จ๐
1 101 20
15
Database TheoryYannakakis Algorithm โ Example
1. Bottom-up traversal (semi-joins)
2. Top-down traversal (semi-joins)
๐จ๐ ๐จ๐
1 201 104 60
๐ 1
๐จ๐ ๐จ๐ ๐จ๐
1 10 1001 20 1003 10 3001 40 3002 30 200
๐ 2
๐จ๐
102030
๐ 3 ๐จ๐ ๐จ๐ ๐จ๐
1 10 10001 20 10001 20 20002 20 2000
๐ 4
๐ = ๐ 1 ๐ด1, ๐ด2 โ ๐ 2 ๐ด1, ๐ด2, ๐ด3 โ ๐ 3 ๐ด2 โ ๐ 4 ๐ด1, ๐ด2, ๐ด4
๐น๐ โ ๐น๐๐น๐ โ ๐น๐
๐น๐ โ ๐น๐
16
Database TheoryYannakakis Algorithm โ Example ๐จ๐ ๐จ๐
1 201 10
๐ 1
๐จ๐ ๐จ๐ ๐จ๐
1 10 1001 20 100
๐ 2
๐จ๐
1020
๐ 3 ๐จ๐ ๐จ๐ ๐จ๐
1 10 10001 20 10001 20 2000
๐ 4
๐ = ๐ 1 ๐ด1, ๐ด2 โ ๐ 2 ๐ด1, ๐ด2, ๐ด3 โ ๐ 3 ๐ด2 โ ๐ 4 ๐ด1, ๐ด2, ๐ด4
1. Bottom-up traversal (semi-joins)
2. Top-down traversal (semi-joins)
3. Join bottom-up
17
Database TheoryYannakakis Algorithm โ Example ๐จ๐ ๐จ๐
1 201 10
๐ 1
๐จ๐ ๐จ๐ ๐จ๐
1 10 1001 20 100
๐ 2
๐จ๐
1020
๐ 3 ๐จ๐ ๐จ๐ ๐จ๐
1 10 10001 20 10001 20 2000
๐ 4
๐ = ๐ 1 ๐ด1, ๐ด2 โ ๐ 2 ๐ด1, ๐ด2, ๐ด3 โ ๐ 3 ๐ด2 โ ๐ 4 ๐ด1, ๐ด2, ๐ด4
1. Bottom-up traversal (semi-joins)
2. Top-down traversal (semi-joins)
3. Join bottom-up
In each join step, each left input tuple joins with at least 1 right input tuple, and vice versa!
18
Yannakakis Algorithm โ Properties
โข Semi-join sweeps take เทฉO ๐
โข A join step can never shrink intermediate result size
- This does not hold for all trees
- Tree must be attribute-connected (more on this soon)
โข Hence all intermediate results are of size O ๐
โข Each join step therefore has O ๐ + ๐ input and O ๐ output
โข Easy to compute a binary join with O ๐ + ๐ input and O ๐ output in time เทฉO ๐ + ๐ , e.g., using sort-merge join
19
Outline tutorial
โข Part 1: Top-๐ (Wolfgang): ~20minโข Part 2: Optimal Join Algorithms (Mirek): ~30min
โ Lower Bound and the Yannakakis Algorithm
โ Problems Caused by Cycles
โ Tree Decompositions
โ Summary and Further Reading
โข Part 3: Ranked enumeration over joins (Nikolaos): ~40min
20
CQs with Cycles
โข 3-path: ๐3๐ = ๐ 1(๐ด1, ๐ด2) โ ๐ 2(๐ด2, ๐ด3) โ ๐ 3(๐ด3, ๐ด4)
โข 3-cycle: ๐3๐ = ๐ 1(๐ด1, ๐ด2) โ ๐ 2(๐ด2, ๐ด3) โ ๐ 3(๐ด3, ๐ด1)
๐3๐ ๐3๐
๐ด1๐ด2
๐ด2 ๐ด3
๐ด3๐ด4
๐ด1 ๐ด2
๐ด3
๐ด2
๐ด3
๐ด1
21
CQs with Cycles
โข 3-path: ๐3๐ = ๐ 1(๐ด1, ๐ด2) โ ๐ 2(๐ด2, ๐ด3) โ ๐ 3(๐ด3, ๐ด4)
โข 3-cycle: ๐3๐ = ๐ 1(๐ด1, ๐ด2) โ ๐ 2(๐ด2, ๐ด3) โ ๐ 3(๐ด3, ๐ด1)
โข Already semi-join-reduced input
๐จ๐ ๐จ๐
1 1
2 1
โฆ โฆ
n 1
๐จ๐ ๐จ๐
1 1
1 2
โฆ โฆ
1 n
๐จ๐ *
1 1
2 2
โฆ โฆ
n n
๐ 1
๐ 1
๐ 2
Join tree
๐ 3
๐ 2 ๐ 3
22
CQs with Cycles
โข 3-path: ๐3๐ = ๐ 1(๐ด1, ๐ด2) โ ๐ 2(๐ด2, ๐ด3) โ ๐ 3(๐ด3, ๐ด4)
โข 3-cycle: ๐3๐ = ๐ 1(๐ด1, ๐ด2) โ ๐ 2(๐ด2, ๐ด3) โ ๐ 3(๐ด3, ๐ด1)
โข Already semi-join reduced in the example
โข ๐ 1 โ ๐ 2 produces ๐2 intermediate results
- Final output size: ๐2 for ๐3๐, but only ๐ for ๐3๐
๐จ๐ ๐จ๐
1 1
2 1
โฆ โฆ
n 1
๐จ๐ ๐จ๐
1 1
1 2
โฆ โฆ
1 n
๐จ๐ *
1 1
2 2
โฆ โฆ
n n
๐ 1
๐ 1
๐ 2
Join tree
๐ 3
๐ 2 ๐ 3
23
What Went Wrong?
โข The tree for the 3-cycle is not attribute-connected!
โข Attribute-connectedness:
- For each attribute, the nodes containing it form a connected sub-graph
โข In the right tree, ๐ด1 violates this property
๐ 1(๐ด1, ๐ด2)
๐ 2(๐ด2, ๐ด3)
๐ 3(๐ด3, ๐ด4)๐3๐ ๐3๐
๐ 1(๐ด1, ๐ด2)
๐ 2(๐ด2, ๐ด3)
๐ 3(๐ด3, ๐ด1)
24
Solutions for Cycles? Some Bad News
โข Maybe we just need an algorithm that is better suited for cyclic CQs?
โข Yes, butโฆ
โข โฆ [Ngo+ 18]:
- เทฉO ๐ + ๐ unattainable based on well-accepted complexity-theoretic assumptions
[Ngo+ 18] Ngo, Porat, Rรฉ, Rudra. Worst-case optimal join algorithms. J. ACMโ18 https://doi.org/10.1145/3180143
25
What Can Be Done?
โข Worst-case-optimal (WCO) join algorithms[Veldhuizen 14, Ngo+ 14, Ngo+ 18]
โข Instead of เทฉO ๐ + ๐ , getเทฉO ๐ + ๐WC = เทฉO ๐WC
โข ๐WC = largest output of Q over any possible DB instance
- Determined by the AGM bound[4]
- Based on fractional edge cover of the join hypergraph
โข 3-cycle: ๐1.5 vs naive upper bound ๐3
[Ngo+ 18] Ngo, Porat, Rรฉ, Rudra. Worst-case optimal join algorithms. J. ACMโ18 https://doi.org/10.1145/3180143
[Veldhuizen 14] Veldhuizen. Triejoin: A simple, worst-case optimal join algorithm. ICDTโ14 https://doi.org/10.5441/002/icdt.2014.13[Ngo+ 14] Ngo, Re, Rudra. Skew strikes back: New developments in the theory of join algorithms. SIGMOD Rec.โ14 https://doi.org/10.1145/2590989.2590991
[Atserias+ 13] Atserias, Grohe, Marx. Size bounds and query plans for relational joins. . SIAM J. Comput.โ13 https://doi.org/10.1137/110859440
26
What Can Be Done?
โข Worst-case-optimal (WCO) join algorithms[Veldhuizen 14, Ngo+ 14, Ngo+ 18]
โข Instead of เทฉO ๐ + ๐ , getเทฉO ๐ + ๐WC = เทฉO ๐WC
โข ๐WC = largest output of Q over any possible DB instance
- Determined by the AGM bound[4]
- Based on fractional edge cover of the join hypergraph
โข 3-cycle: ๐1.5 vs naive upper bound ๐3
โข Tree decompositions
โข Put more effort into pre-processing to avoid large intermediate results
- Use WCO joins as sub-routine
โข Goal: เทฉO ๐๐ + ๐ for smallest ๐possible
- เทฉO ๐๐ captures pre-processing cost
- ๐ = 1 for acyclic CQ
[Ngo+ 18] Ngo, Porat, Rรฉ, Rudra. Worst-case optimal join algorithms. J. ACMโ18 https://doi.org/10.1145/3180143
[Veldhuizen 14] Veldhuizen. Triejoin: A simple, worst-case optimal join algorithm. ICDTโ14 https://doi.org/10.5441/002/icdt.2014.13[Ngo+ 14] Ngo, Re, Rudra. Skew strikes back: New developments in the theory of join algorithms. SIGMOD Rec.โ14 https://doi.org/10.1145/2590989.2590991
[Atserias+ 13] Atserias, Grohe, Marx. Size bounds and query plans for relational joins. . SIAM J. Comput.โ13 https://doi.org/10.1137/110859440
27
Outline tutorial
โข Part 1: Top-๐ (Wolfgang): ~20minโข Part 2: Optimal Join Algorithms (Mirek): ~30min
โ Lower Bound and the Yannakakis Algorithm
โ Problems Caused by Cycles
โ Tree Decompositions
โ Summary and Further Reading
โข Part 3: Ranked enumeration over joins (Nikolaos): ~40min
28
Main Idea of Tree Decompositions
โข Convert cyclic CQ to a rooted tree-shaped CQ
A1
A4
A3
A2A6
A5
R6 R1
R2
R3R4
R5
S = ๐ 1 ๐ด1, ๐ด2 โ ๐ 2 ๐ด2, ๐ด3 โ ๐ 3 ๐ด3, ๐ด4
T = ๐ 4 ๐ด4, ๐ด5 โ ๐ 5 ๐ด5, ๐ด6 โ ๐ 6(๐ด6, ๐ด1)
decomposition
29
Main Idea of Tree Decompositions
โข Convert cyclic CQ to a rooted tree-shaped CQ
โข Materialize all tree nodes (โbagsโ) using a WCO join algorithm
A1
A4
A3
A2A6
A5
R6 R1
R2
R3R4
R5
S = ๐ 1 ๐ด1, ๐ด2 โ ๐ 2 ๐ด2, ๐ด3 โ ๐ 3 ๐ด3, ๐ด4
T = ๐ 4 ๐ด4, ๐ด5 โ ๐ 5 ๐ด5, ๐ด6 โ ๐ 6(๐ด6, ๐ด1)
decomposition
S
T
30
Main Idea of Tree Decompositions
โข Convert cyclic CQ to a rooted tree-shaped CQ
โข Materialize all tree nodes (โbagsโ) using a WCO join algorithm
โข Apply Yannakakis algorithm on the tree
- Acyclic CQ whose input relations are the bags
- Achieves เทฉO ๐ฅ + ๐ where ๐ฅ is the size of the largest bag
A1
A4
A3
A2A6
A5
R6 R1
R2
R3R4
R5
S = ๐ 1 ๐ด1, ๐ด2 โ ๐ 2 ๐ด2, ๐ด3 โ ๐ 3 ๐ด3, ๐ด4
T = ๐ 4 ๐ด4, ๐ด5 โ ๐ 5 ๐ด5, ๐ด6 โ ๐ 6(๐ด6, ๐ด1)
decomposition
S
T
Yannakakis
31
Tree Decomposition Intuition
๐6๐ ๐ด1, โฆ , ๐ด6 = ๐ 1 ๐ด1, ๐ด2 โ ๐ 2 ๐ด2, ๐ด3โ ๐ 3 ๐ด3, ๐ด4 โ ๐ 4 ๐ด4, ๐ด5โ ๐ 5(๐ด5, ๐ด6) โ ๐ 6(๐ด6, ๐ด1)
32
Tree Decomposition Intuition
๐6๐ ๐ด1, โฆ , ๐ด6 = ๐ 1 ๐ด1, ๐ด2 โ ๐ 2 ๐ด2, ๐ด3โ ๐ 3 ๐ด3, ๐ด4 โ ๐ 4 ๐ด4, ๐ด5โ ๐ 5(๐ด5, ๐ด6) โ ๐ 6(๐ด6, ๐ด1)
Every relation appearing in the query is covered by a bag (tree node)
For each attribute, the bags containing it are connected
33
Tree Decomposition Intuition
๐6๐ ๐ด1, โฆ , ๐ด6 = ๐ 1 ๐ด1, ๐ด2 โ ๐ 2 ๐ด2, ๐ด3โ ๐ 3 ๐ด3, ๐ด4 โ ๐ 4 ๐ด4, ๐ด5โ ๐ 5(๐ด5, ๐ด6) โ ๐ 6(๐ด6, ๐ด1)
Every relation appearing in the query is covered by a bag (tree node)
For each attribute, the bags containing it are connected
34
Tree Decomposition Intuition
Every relation appearing in the query is covered by a bag (tree node)
For each attribute, the bags containing it are connected
๐6๐ ๐ด1, โฆ , ๐ด6 = ๐ 1 ๐ด1, ๐ด2 โ ๐ 2 ๐ด2, ๐ด3โ ๐ 3 ๐ด3, ๐ด4 โ ๐ 4 ๐ด4, ๐ด5โ ๐ 5(๐ด5, ๐ด6) โ ๐ 6(๐ด6, ๐ด1)
๐ 1 ๐ด1, ๐ด2 , ๐ 2 ๐ด2, ๐ด3 , ๐ 3 ๐ด3, ๐ด4๐ 4 ๐ด4, ๐ด5 , ๐ 5 ๐ด5, ๐ด6 , ๐ 6(๐ด6, ๐ด1)
๐ฏ1
Bag materialization costs O(๐3) (AGM bound)
35
Tree Decomposition Intuition
Every relation appearing in the query is covered by a bag (tree node)
For each attribute, the bags containing it are connected
๐6๐ ๐ด1, โฆ , ๐ด6 = ๐ 1 ๐ด1, ๐ด2 โ ๐ 2 ๐ด2, ๐ด3โ ๐ 3 ๐ด3, ๐ด4 โ ๐ 4 ๐ด4, ๐ด5โ ๐ 5(๐ด5, ๐ด6) โ ๐ 6(๐ด6, ๐ด1)
๐ 1 ๐ด1, ๐ด2 , ๐ 2 ๐ด2, ๐ด3 , ๐ 3 ๐ด3, ๐ด4๐ 4 ๐ด4, ๐ด5 , ๐ 5 ๐ด5, ๐ด6 , ๐ 6(๐ด6, ๐ด1)
๐ฏ1
Bag materialization costs O(๐3) (AGM bound)
36
Tree Decomposition Intuition
Every relation appearing in the query is covered by a bag (tree node)
For each attribute, the bags containing it are connected
๐ 1 ๐ด1, ๐ด2 , ๐ 2 ๐ด2, ๐ด3 , ๐ 3 ๐ด3, ๐ด4๐ฏ2
๐ 4 ๐ด4, ๐ด5 , ๐ 5 ๐ด5, ๐ด6 , ๐ 6(๐ด6, ๐ด1)
Bag materialization costs O(๐2) (AGM bound)
๐6๐ ๐ด1, โฆ , ๐ด6 = ๐ 1 ๐ด1, ๐ด2 โ ๐ 2 ๐ด2, ๐ด3โ ๐ 3 ๐ด3, ๐ด4 โ ๐ 4 ๐ด4, ๐ด5โ ๐ 5(๐ด5, ๐ด6) โ ๐ 6(๐ด6, ๐ด1)
37
Tree Decomposition Intuition
Every relation appearing in the query is covered by a bag (tree node)
For each attribute, the bags containing it are connected
๐ 1 ๐ด1, ๐ด2 , ๐ 2 ๐ด2, ๐ด3 , ๐ 3 ๐ด3, ๐ด4๐ฏ2
๐ 4 ๐ด4, ๐ด5 , ๐ 5 ๐ด5, ๐ด6 , ๐ 6(๐ด6, ๐ด1)
Bag materialization costs O(๐2) (AGM bound)
๐6๐ ๐ด1, โฆ , ๐ด6 = ๐ 1 ๐ด1, ๐ด2 โ ๐ 2 ๐ด2, ๐ด3โ ๐ 3 ๐ด3, ๐ด4 โ ๐ 4 ๐ด4, ๐ด5โ ๐ 5(๐ด5, ๐ด6) โ ๐ 6(๐ด6, ๐ด1)
38
Tree Decomposition Intuition
Every relation appearing in the query is covered by a bag (tree node)
For each attribute, the bags containing it are connected
๐ฏ3 ๐ 1 ๐ด1, ๐ด2
๐ 2 ๐ด2, ๐ด3
๐ 3 ๐ด3, ๐ด4
๐ 4 ๐ด4, ๐ด5
๐ 5 ๐ด5, ๐ด6
๐ 6(๐ด6, ๐ด1)
O(๐) bag materializationโฆ?
๐6๐ ๐ด1, โฆ , ๐ด6 = ๐ 1 ๐ด1, ๐ด2 โ ๐ 2 ๐ด2, ๐ด3โ ๐ 3 ๐ด3, ๐ด4 โ ๐ 4 ๐ด4, ๐ด5โ ๐ 5(๐ด5, ๐ด6) โ ๐ 6(๐ด6, ๐ด1)
39
Tree Decomposition Intuition
Every relation appearing in the query is covered by a bag (tree node)
For each attribute, the bags containing it are connected
๐ฏ3 ๐ 1 ๐จ๐, ๐ด2
๐ 2 ๐ด2, ๐ด3
๐ 3 ๐ด3, ๐ด4
๐ 4 ๐ด4, ๐ด5
๐ 5 ๐ด5, ๐ด6
๐ 6(๐ด6, ๐จ๐)
๐6๐ ๐ด1, โฆ , ๐ด6 = ๐ 1 ๐ด1, ๐ด2 โ ๐ 2 ๐ด2, ๐ด3โ ๐ 3 ๐ด3, ๐ด4 โ ๐ 4 ๐ด4, ๐ด5โ ๐ 5(๐ด5, ๐ด6) โ ๐ 6(๐ด6, ๐ด1)
40
Tree Decomposition Intuition
Every relation appearing in the query is covered by a bag (tree node)
For each attribute, the bags containing it are connected
๐ฏ3 ๐ 1 ๐ด1, ๐ด2
๐ 2 ๐ด2, ๐ด3 , ๐ 1 ๐ด1, _
๐ 3 ๐ด3, ๐ด4 , ๐ 1 ๐ด1, _
๐ 4 ๐ด4, ๐ด5 , ๐ 1 ๐ด1, _
๐ 5 ๐ด5, ๐ด6 , ๐ 1 ๐ด1, _
๐ 6(๐ด6, ๐ด1)
O ๐ โ ๐๐ด1 ๐ 1 bag materialization: still O(๐2)
๐6๐ ๐ด1, โฆ , ๐ด6 = ๐ 1 ๐ด1, ๐ด2 โ ๐ 2 ๐ด2, ๐ด3โ ๐ 3 ๐ด3, ๐ด4 โ ๐ 4 ๐ด4, ๐ด5โ ๐ 5(๐ด5, ๐ด6) โ ๐ 6(๐ด6, ๐ด1)
41
Tree Decomposition: Formal Definition
โข Given: hypergraph โ = (๐ฑ, โฐ)
- ๐ฑ: attributes
โข E.g., ๐ด1, ๐ด2, ๐ด3, ๐ด4, ๐ด5, ๐ด6
- โฐ: relations
โข E.g., ๐ 3 is hyperedge (๐ด3, ๐ด4)
โข A tree decomposition of โ is a pair ๐ฏ, ๐ where
- ๐ฏ = ๐ ๐ฏ ,๐ธ(๐ฏ) is a tree
- ๐:๐ ๐ฏ โ 2๐ฑ assigns a bag ๐(๐ฃ) to each tree node ๐ฃ such that
โข Each hyperedge ๐น โ โฐ is covered, i.e., โ๐น โ โฐ: โ๐ฃ โ ๐ ๐ฏ : ๐น โ ๐(๐ฃ)
โข For each ๐ข โ ๐ฑ, the bags containing ๐ข are connected
A1
A4
A3
A2A6
A5
R6 R1
R2
R3R4
R5
๐ฏ3 ๐ 1 ๐ด1, ๐ด2
๐ 2 ๐ด2, ๐ด3 , ๐ 1 ๐ด1, _
๐ 3 ๐ด3, ๐ด4 , ๐ 1 ๐ด1, _
๐ 4 ๐ด4, ๐ด5 , ๐ 1 ๐ด1, _
๐ 5 ๐ด5, ๐ด6 , ๐ 1 ๐ด1, _
๐ 6(๐ด6, ๐ด1)
[Khamis, Ngo, Suciu. What do shannon-type inequalities, submodular width, and disjunctive datalog have to do with one another? PODSโ17] https://doi.org/10.1145/3034786.3056105
42
Tree-Decomposition Properties
โข Query has multiple decompositionsโwhich is best?
43
Tree-Decomposition Properties
โข Query has multiple decompositionsโwhich is best?
โข Consider a tree with O(๐) nodes, each materialized using WCO join
- Worst-case output of bag ๐ is of size O(๐๐๐) for some ๐๐ โฅ 1 (AGM bound)
- Fractional hypertree width (fhw) ๐ = max๐
๐๐ [Grohe+ 14]
- Total bag-materialization cost: O(๐๐)
- Size of a materialized bag: O(๐๐)
- Resulting cost for Yannakakis algorithm on materialized tree: เทฉO(๐๐ + ๐)
[Grohe+ 14] Grohe and Marx. Constraint solving via fractional edge covers. ACM TALGโ14. https://doi.org/10.1145/2636918
44
Who Wins?๐ 1 ๐ด1, ๐ด2 , ๐ 2 ๐ด2, ๐ด3 , ๐ 3 ๐ด3, ๐ด4๐ 4 ๐ด4, ๐ด5 , ๐ 5 ๐ด5, ๐ด6 , ๐ 6(๐ด6, ๐ด1)
๐ฏ1
๐ 1 ๐ด1, ๐ด2 , ๐ 2 ๐ด2, ๐ด3 , ๐ 3 ๐ด3, ๐ด4๐ฏ2
๐ 4 ๐ด4, ๐ด5 , ๐ 5 ๐ด5, ๐ด6 , ๐ 6(๐ด6, ๐ด1)
๐ฏ3 ๐ 1 ๐ด1, ๐ด2
๐ 2 ๐ด2, ๐ด3 , ๐ 1 ๐ด1, _
๐ 3 ๐ด3, ๐ด4 , ๐ 1 ๐ด1, _
๐ 4 ๐ด4, ๐ด5 , ๐ 1 ๐ด1, _
๐ 5 ๐ด5, ๐ด6 , ๐ 1 ๐ด1, _
๐ 6(๐ด6, ๐ด1)
๐ฃ๐ > ๐ฃ๐ = ๐ฃ๐
O(๐2)
O(๐2)
O(๐3)
45
A Closer Look
โข ๐ฏ1 loses, because it does not decompose the query
46
A Closer Look
โข ๐ฏ1 loses, because it does not decompose the query
โข Are ๐ฏ2 and ๐ฏ3 really equally good?
- In ๐ฏ2, bag computation requires joining 3 relations
- In ๐ฏ3, at most 2 relations are joined
โข One of them is just the set of distinct ๐ด1-values in ๐ 1
47
A Closer Look
โข ๐ฏ1 loses, because it does not decompose the query
โข Are ๐ฏ2 and ๐ฏ3 really equally good?
- In ๐ฏ2, bag computation requires joining 3 relations
- In ๐ฏ3, at most 2 relations are joined
โข One of them is just the set of distinct ๐ด1-values in ๐ 1
โข ๐ฏ3 is better when the number of distinct ๐ด1-values in ๐ 1 is low, e.g., ๐(๐2/3) instead of ๐ ๐
48
Who Wins?
๐ฏ3
๐ 1 ๐ด1, ๐ด2
๐ 2 ๐ด2, ๐ด3 , ๐ 1 ๐ด1, _
๐ 3 ๐ด3, ๐ด4 , ๐ 1 ๐ด1, _
๐ 4 ๐ด4, ๐ด5 , ๐ 1 ๐ด1, _
๐ 5 ๐ด5, ๐ด6 , ๐ 1 ๐ด1, _
๐ 6(๐ด6, ๐ด1)
Degree constraint: ๐๐ด1 ๐ 1 โค ๐2/3
O(๐)
O(๐)
O(๐5/3)
O(๐5/3)
O(๐5/3)
O(๐5/3)
โThe number of distinct ๐ด1values in ๐ 1 is at most ๐2/3โ
49
Who Wins?
๐ 1 ๐ด1, ๐ด2 , ๐ 2 ๐ด2, ๐ด3 , ๐ 3 ๐ด3, ๐ด4
๐ฏ2
๐ 4 ๐ด4, ๐ด5 , ๐ 5 ๐ด5, ๐ด6 , ๐ 6(๐ด6, ๐ด1)
๐ฏ3
๐ 1 ๐ด1, ๐ด2
๐ 2 ๐ด2, ๐ด3 , ๐ 1 ๐ด1, _
๐ 3 ๐ด3, ๐ด4 , ๐ 1 ๐ด1, _
๐ 4 ๐ด4, ๐ด5 , ๐ 1 ๐ด1, _
๐ 5 ๐ด5, ๐ด6 , ๐ 1 ๐ด1, _
๐ 6(๐ด6, ๐ด1)
Degree constraint: ๐๐ด1 ๐ 1 โค ๐2/3
O(๐)
O(๐)
O(๐5/3)
O(๐5/3)
O(๐5/3)
O(๐5/3)
O(๐2)
O(๐2)
50
Could ๐ฏ2 Win?
โข Consider bag ๐ 1 ๐ด1, ๐ด2 , ๐ 2 ๐ด2, ๐ด3 , ๐ 3 ๐ด3, ๐ด4 in ๐ฏ2
โข What if each ๐ 1-tuple joins with only โa fewโ ๐ 2-tuples?
โข What if each ๐ 2-tuple joins with only โa fewโ ๐ 3-tuples?
โข What if โa fewโ was ๐1/3?
51
Who Wins Now?
๐ 1 ๐ด1, ๐ด2 , ๐ 2 ๐ด2, ๐ด3 , ๐ 3 ๐ด3, ๐ด4
๐ฏ2
๐ 4 ๐ด4, ๐ด5 , ๐ 5 ๐ด5, ๐ด6 , ๐ 6(๐ด6, ๐ด1)
Degree constraint: โ๐ โ {2,3,5,6}:
โ๐: ๐๐ด๐+1๐๐ด๐=๐ ๐ ๐ โค ๐1/3
O(๐5/3)
O(๐5/3)
52
Who Wins Now?
๐ 1 ๐ด1, ๐ด2 , ๐ 2 ๐ด2, ๐ด3 , ๐ 3 ๐ด3, ๐ด4
๐ฏ2
๐ 4 ๐ด4, ๐ด5 , ๐ 5 ๐ด5, ๐ด6 , ๐ 6(๐ด6, ๐ด1)
๐ฏ3
๐ 1 ๐ด1, ๐ด2
๐ 2 ๐ด2, ๐ด3 , ๐ 1 ๐ด1, _
๐ 3 ๐ด3, ๐ด4 , ๐ 1 ๐ด1, _
๐ 4 ๐ด4, ๐ด5 , ๐ 1 ๐ด1, _
๐ 5 ๐ด5, ๐ด6 , ๐ 1 ๐ด1, _
๐ 6(๐ด6, ๐ด1)
Degree constraint: โ๐ โ {2,3,5,6}:
โ๐: ๐๐ด๐+1๐๐ด๐=๐ ๐ ๐ โค ๐1/3
O(๐)
O(๐)
O(๐2)
O(๐2)
O(๐2)
O(๐2)
O(๐5/3)
O(๐5/3)
53
Best of Both Worlds
โข Depending on the degree constraints that hold for a DB instance, we may sometimes prefer ๐ฏ2 and sometimes ๐ฏ3
โข What if we used both? [Alon+ 97, Marx 13]
- Intuition: each decomposition is a different query โplanโ
โข Query output = union of individual plansโ results
- Decide for each input tuple to which plan(s) to send it
- Main idea: split each input relation into heavy and light
โข Goal: enforce desirable degree constraints for each tree
[Marx 13] Marx. Tractable hypergraph properties for constraint satisfaction and conjunctive queries. J.ACMโ13 https://doi.org/10.1145/2535926
[Alon+ 97] Alon, Yuster, Zwick. Finding and counting given length cycles. Algorithmicaโ97 https://doi.org/10.1007/BF02523189
54
Multiple Plans: Plan 1
๐ฏ3
๐ 1๐ป ๐ด1, ๐ด2
๐ 2 ๐ด2, ๐ด3 , ๐ 1๐ป ๐ด1, _
๐ 3 ๐ด3, ๐ด4 , ๐ 1๐ป ๐ด1, _
๐ 4 ๐ด4, ๐ด5 , ๐ 1๐ป ๐ด1, _
๐ 5 ๐ด5, ๐ด6 , ๐ 1๐ป ๐ด1, _
๐ 6(๐ด6, ๐ด1)
๐ 1๐ป: contains all tuples whose
๐ด1-values occur more than
๐1/3 times (fewer than ๐2/3
such ๐ด1-values exist)
55
Multiple Plans: Plan 1
๐ฏ3: computes ๐ 1๐ป โ ๐ 2 โ โฏ โ ๐ 6
๐ 1๐ป ๐ด1, ๐ด2
๐ 2 ๐ด2, ๐ด3 , ๐ 1๐ป ๐ด1, _
๐ 3 ๐ด3, ๐ด4 , ๐ 1๐ป ๐ด1, _
๐ 4 ๐ด4, ๐ด5 , ๐ 1๐ป ๐ด1, _
๐ 5 ๐ด5, ๐ด6 , ๐ 1๐ป ๐ด1, _
๐ 6(๐ด6, ๐ด1)
Degree constraint: ๐๐ด1 ๐ 1๐ป โค ๐2/3
O(๐)
O(๐)
O(๐5/3)
O(๐5/3)
O(๐5/3)
O(๐5/3)
๐ 1๐ป: contains all tuples whose
๐ด1-values occur more than
๐1/3 times (fewer than ๐2/3
such ๐ด1-values exist)
56
More Plans
โข Note that
- ๐6๐ = ๐ 1 โ ๐ 2 โ ๐ 3 โ ๐ 4 โ ๐ 5 โ ๐ 6 together with
- ๐ 1๐ฟ = ๐ 1 โ ๐ 1
๐ป
โข implies that ๐6๐ is the union of
- ๐ 1๐ป โ ๐ 2 โ ๐ 3 โ ๐ 4 โ ๐ 5 โ ๐ 6 and
- ๐ 1๐ฟ โ ๐ 2 โ ๐ 3 โ ๐ 4 โ ๐ 5 โ ๐ 6
โข To compute the latter, apply the same trick to ๐ 2
57
Multiple Plans: Plan 2
๐ฏ3: computes ๐ 1๐ฟ โ ๐ 2
๐ป โ ๐ 3 โ โฏ โ ๐ 6
๐ 2๐ป ๐ด2, ๐ด3
๐ 3 ๐ด3, ๐ด4 , ๐ 2๐ป ๐ด2, _
๐ 4 ๐ด4, ๐ด5 , ๐ 2๐ป ๐ด2, _
๐ 5 ๐ด5, ๐ด6 , ๐ 2๐ป ๐ด2, _
๐ 6 ๐ด6, ๐ด1 , ๐ 2๐ป ๐ด2, _
๐ 1๐ฟ(๐ด1, ๐ด2)
Degree constraint: ๐๐ด2 ๐ 2๐ป โค ๐2/3
๐ 2๐ป: contains all tuples whose
๐ด2-values occur more than ๐1/3 times (fewer than ๐2/3
such ๐ด2-values exist)
๐ 2๐ฟ = ๐ 2 โ ๐ 2
๐ป
58
Plans 3 to 6
โข Plans discussed so far
- ๐ 1๐ฟ โ ๐ 2
๐ฟ โ ๐ 3๐ป โ ๐ 4 โ ๐ 5 โ ๐ 6
- ๐ 1๐ฟ โ ๐ 2
๐ฟ โ ๐ 3๐ฟ โ ๐ 4
๐ป โ ๐ 5 โ ๐ 6
โข Continue analogously to compute
- ๐ 1๐ฟ โ ๐ 2
๐ฟ โ ๐ 3๐ป โ ๐ 4 โ ๐ 5 โ ๐ 6
- ๐ 1๐ฟ โ ๐ 2
๐ฟ โ ๐ 3๐ฟ โ ๐ 4
๐ป โ ๐ 5 โ ๐ 6
- ๐ 1๐ฟ โ ๐ 2
๐ฟ โ ๐ 3๐ฟ โ ๐ 4
๐ฟ โ ๐ 5๐ป โ ๐ 6
- ๐ 1๐ฟ โ ๐ 2
๐ฟ โ ๐ 3๐ฟ โ ๐ 4
๐ฟ โ ๐ 5๐ฟ โ ๐ 6
๐ป
โข What is missing?
59
The 7-th Plan
โข Join all light-only partitions with each other:
- ๐ 1๐ฟ โ ๐ 2
๐ฟ โ ๐ 3๐ฟ โ ๐ 4
๐ฟ โ ๐ 5๐ฟ โ ๐ 6
๐ฟ
โข Input now satisfies the other degree constraint:
- โ๐ โ {2,3,5,6}: โ๐: ๐๐ด๐+1๐๐ด๐=๐ ๐ ๐ โค ๐1/3
โข Use decomposition ๐ฏ2 for it!
60
Analysis and Discussion
โข Rewrite 6-cycle into 7 sub-queries
- Six of them use ๐ฏ3, copying the heavy attribute to intermediate bags
- One uses ๐ฏ2 on the all-light case
โข Analysis
- Assigning input tuples to subqueries: O(๐)
- Bag materialization: O(๐5/3)
- Bag size: O(๐5/3)
โข Running Yannakakis on each of the 7 trees takes เทฉO ๐5/3 + ๐
- Beats single-tree complexity เทฉO ๐2 + ๐ and WCO-join complexity เทฉO ๐3
61
Tree Decompositions: The Big Picture
โข WCO join algorithms applied to a query Q: complexity determined by the AGM bound for Q
โข Decomposing Q into a tree creates bags that each may have a lower AGM bound value (fractional hypertree width)
โข This reduces time complexity
- Materialize each bag using a WCO join algorithm
- Run Yannakakis algorithm on the tree with materialized bags as input
โข Design goal: minimize width of tree, but ensure that each input relation is covered by a bag and the tree is attribute-connected!
62
Tree Decompositions: Degree Constraints
โข Degree constraints generalize cardinality constraints and functional dependencies
โข Enable improved complexity guarantees
โข Application to general input (with only cardinality constraints):
- Partition input so that each partition satisfies stronger degree constraints
- Use appropriate tree for each partition
- Current frontier: submodular width
โข Application to input DB with degree constraints:
- Degree-aware submodular width
[Khamis, Ngo, Suciu. What do shannon-type inequalities, submodular width, and disjunctive datalog have to do with one another? PODSโ17] https://doi.org/10.1145/3034786.3056105
[Joglekar, Rรฉ. It's all a matter of degree. Theory of Computing Systemsโ18] https://doi.org/10.1007/s00224-017-9811-8
[Marx. Tractable hypergraph properties for constraint satisfaction and conjunctive queries. J.ACMโ13] https://doi.org/10.1145/2535926
63
Outline tutorial
โข Part 1: Top-๐ (Wolfgang): ~20minโข Part 2: Optimal Join Algorithms (Mirek): ~30min
โ Lower Bound and the Yannakakis Algorithm
โ Problems Caused by Cycles
โ Tree Decompositions
โ Summary and Further Reading
โข Part 3: Ranked enumeration over joins (Nikolaos): ~40min
64
Optimal-Joins Summary
โข On acyclic CQs, the Yannakakis algorithmโs complexity เทฉO ๐ + ๐matches the lower bound
- Simple join-at-a-time query plan after semi-join reduction sweeps
โข Lower bound cannot be matched for general cyclic CQs
โข WCO joins achieve เทฉO ๐ + ๐WC
- โHolisticโ multi-way join approach
โข Tree decomposition approaches achieve เทฉO ๐๐ + ๐
- Best ๐ is submodular width subw(Q) using multi-tree decomposition
- ๐subw(๐) = O(๐WC), but usually better
โข 6-cycle: ๐WC = ๐3, but multi-tree approach achieves เทฉO ๐5/3 + ๐
65
Further Reading
โข Extensions of the query model
- Projections and aggregation
โข Factorized DB
โข Stronger notions of optimality
โข โฆand many more (see our tutorial paper)
[Khamis, Ngo, Rudra. FAQ: questions asked frequently. PODSโ16] https://doi.org/10.1145/2902251.2902280
[Bakibayev, Koฤiskรฝ, Olteanu, Zรกvodnรฝ. Aggregation and Ordering in Factorised Databases. PVLDBโ13] https://doi.org/10.14778/2556549.2556579
[Bakibayev, Olteanu, Zรกvodnรฝ. FDB: A Query Engine for Factorised Relational Databases. PVLDBโ12] https://doi.org/10.14778/2350229.2350242
[Olteanu, Schleich. Factorized databases. SIGMOD Recordโ16] https://doi.org/10.1145/3003665.3003667
[Olteanu, Zรกvodny. Size bounds for factorised representations of query results. TODSโ15] https://doi.org/10.1145/2656335
[Khamis, Rรฉ, Rudra. Joins via Geometric Resolutions: Worst Case and Beyond. TODSโ16] https://doi.org/10.1145/2967101
[Ngo, Nguyen, Re, Rudra. Beyond worst-case analysis for joins with minesweeper. PODSโ14] https://doi.org/10.1145/2594538.2594547
Recommended