Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
joinsprof. Stratos Idreos
HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/
class 14
/34CS165, Fall 2016 Stratos Idreos 2
HOW TO DO EXPERIMENTS?
find out what matters, change one thing at a time
say we want to test the select operator, to compare scan vs secondary index
/34CS165, Fall 2016 Stratos Idreos 3
do not listen to youtube while you run experiments!close all apps, recreate the same environment every time
create scripts for everything ask to use our server
ISOLATE PERFORMANCE AS BEST AS POSSIBLE
/34CS165, Fall 2016 Stratos Idreos 4
resp
onse
tim
e (s
ecs)
data size (GB)re
spon
se ti
me
(sec
s)data size (GB)
e.g., to test the select operator
OK not OK
examples needed for final evaluation
/34CS165, Fall 2016 Stratos Idreos 5
thro
ughp
ut (q
/s)
# of queriesth
roug
hput
(q/s
)# of queries
e.g., to test shared scans
OK not OK
examples needed for final evaluation
/34CS165, Fall 2016 Stratos Idreos 6
fast scans
hardware, data and query based optimizations(project=m3)
apply to all algo/data structures
/34CS165, Fall 2016 Stratos Idreos 7
so far
database kernel
data data data
algo
rithm
s/op
erat
ors
disk
memory
cpu
scan, binary search, tuple reconstruction, min,
max, search b-tree, etc.
arrays, columns, matrixes, rows,trees
early/late tuple reconstruction, tuple-at-a-time, vectorized or bulk processing, intermediates format, pushing
selects down, etc
/34CS165, Fall 2016 Stratos Idreos 8
joins(project=m4)
/34CS165, Fall 2016 Stratos Idreos 9
fact table(id1,id2,…)
dimension table 1(id1,…)
dimension table 2(id2,…)
…
…
…
star schema
/34CS165, Fall 2016 Stratos Idreos 10
avoid duplicates - minimize update cost - but we have to do joins
/34CS165, Fall 2016 Stratos Idreos 11
…
snowflake schema
/34CS165, Fall 2016 Stratos Idreos 12
professor(id,name,…)
course(id,name, profId,…)
student(id,name,…)
database
give me all students enrolled in cs165select student.name from student, enrolled, course where course.name=“cs165” and enrolled.courseId=course.id and student.id=enrolled.studentId
enrolled(studentId,
courseId,…) foreign key
join
/34CS165, Fall 2016 Stratos Idreos 13
table 1 table 2
joinforeign key
referencing table 2 primary key
find all tuples where FK=PKjoin: glue the data back together
/34CS165, Fall 2016 Stratos Idreos 13
table 1 table 2
joinforeign key
referencing table 2 primary key
find all tuples where FK=PKjoin: glue the data back together
equi join
/34CS165, Fall 2016 Stratos Idreos 14
1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5
1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6
1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3
joinjoin resultkey,payload key,payload
/34CS165, Fall 2016 Stratos Idreos 14
1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5
1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6
1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3
joinjoin resultkey,payload key,payload
/34CS165, Fall 2016 Stratos Idreos 14
1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5
1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6
1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3
joinjoin resultkey,payload key,payload
/34CS165, Fall 2016 Stratos Idreos 14
1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5
1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6
1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3
joinjoin resultkey,payload key,payload
/34CS165, Fall 2016 Stratos Idreos 14
1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5
1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6
1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3
joinjoin resultkey,payload key,payload
/34CS165, Fall 2016 Stratos Idreos 14
1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5
1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6
1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3
joinjoin resultkey,payload key,payload
/34CS165, Fall 2016 Stratos Idreos 14
1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5
1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6
1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3
joinjoin resultkey,payload key,payload
/34CS165, Fall 2016 Stratos Idreos 14
1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5
1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6
1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3
joinjoin resultkey,payload key,payload
/34CS165, Fall 2016 Stratos Idreos 14
1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5
1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6
1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3
joinjoin resultkey,payload key,payload
/34CS165, Fall 2016 Stratos Idreos 14
1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5
1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6
1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3
joinjoin resultkey,payload key,payload
/34CS165, Fall 2016 Stratos Idreos 15
inner join
left outer
right outer
v1, a1 v2, a2
v3, b1 v1, b2
v1, a1, b2
v1, a1, b2 v2, a2, null
v1, a1, b2 v3, null, b1
join
/34CS165, Fall 2016 Stratos Idreos 16
select courses.name=“cs165”
join enrolled.courseid=course.id
students enrolled courses
join student.id=enrolled.studentid
project student.name
good plan
select student.name from students, enrolled, courses where courses.name=“cs165” and enrolled.courseId=course.id and student.id=enrolled.studentId
/34CS165, Fall 2016 Stratos Idreos 17
select courses.name=“cs165”
join enrolled.courseid=course.id
students enrolled courses
join student.id=enrolled.studentid
project student.name
select student.name from students, enrolled, courses where courses.name=“cs165” and enrolled.courseId=course.id and student.id=enrolled.studentId
pushing selects down
/34CS165, Fall 2016 Stratos Idreos 18
selectR.C
selectS.F
join
fetchR.A
fetchS.A
maxR.D
minS.G
R(A,B,C,D) - S(A,E,F,G)
select max(R.D),min(S.G) from R,S where R.A=S.A and R.C<10 and S.F>30
/34CS165, Fall 2016 Stratos Idreos 18
selectR.C
selectS.F
join
fetchR.A
fetchS.A
maxR.D
minS.G
R(A,B,C,D) - S(A,E,F,G)
select max(R.D),min(S.G) from R,S where R.A=S.A and R.C<10 and S.F>30
block operator
access patterns
/34CS165, Fall 2016 Stratos Idreos 19
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
select(inter2,40,50)
select(Sa,50,65)
/34CS165, Fall 2016 Stratos Idreos 19
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
select(inter2,40,50)
select(inter2,40,50)
select(Sa,50,65)
/34CS165, Fall 2016 Stratos Idreos 20
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
select(inter2,40,50)
select(Sa,50,65)
/34CS165, Fall 2016 Stratos Idreos 20
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
select(inter2,40,50)
select(Sa,50,65)
select(Sa,50,65)
/34CS165, Fall 2016 Stratos Idreos 21
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
select(inter2,40,50)
select(Sa,50,65)
/34CS165, Fall 2016 Stratos Idreos 21
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
select(inter2,40,50)
select(Sa,50,65)
project m4
/34CS165, Fall 2016 Stratos Idreos 22
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
select(inter2,40,50)
select(Sa,50,65)
/34CS165, Fall 2016 Stratos Idreos 22
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
select(inter2,40,50)
select(Sa,50,65)
/34CS165, Fall 2016 Stratos Idreos 23
new resL[]; new resR[]; k=0 for (i=0;i<L.size;i=i++) for (j=0;j<R.size;j++) if L[i]==R[j] resL[k]=i resR[k++]=j
nested loops
L R
for all tuples of one side check all tuples of the other side
/34CS165, Fall 2016 Stratos Idreos 24
outer(r) inner(s)join probe the small side
for better data locality
stream outer pages hold inner pages
Level1
Level2
res
r+s+res, s<=L1-2
say red fits in L1
/34CS165, Fall 2016 Stratos Idreos 25
what if not all data fits in main-memory?
what if not all data fits in L3 cache?
what if not all data fits in L2 cache?
what if not all data fits in L1 cache?
can we exploit the >1 cores in modern CPUs?
/34CS165, Fall 2016 Stratos Idreos 26
for every R.key = S.key pair
return [R.pos,S.pos]
1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5
1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6
joinR: key,payload S: key,payload
1) design a nested loops join algorithm and give its cost 2) which table do we start from? 3) think of optimizations to minimize L1 misses 4) can we use sorting?
Assume we have a b-tree on one of the join columns: 5) how would you use the b-tree to do a join? 6) would you use the b-tree or the nested loops join?
level 1
level 2
R columns > L1, all S columns >L1, R+S << L2, L1 block = L2 block CPU can read directly from L1 only
CPU
data/results one column-at-a-time
/34CS165, Fall 2016 Stratos Idreos 27
zig zagouter inner
phase 1
123
outer innerphase 2
123
…
will still be in LLC
/34CS165, Fall 2016 Stratos Idreos 28
new resL[]; new resR[]; k=0 for (i=0;i<L.size;i=i+pagesize) for (j=0;j<R.size;j=j+pagesize) for (r=i;r<i+pagesize;r++) for (m=j;m<j+pagesize;m++) if L[r]==R[m]
resL[k]=r resR[k++]=m
comp O(LxR) I/O O(L/p+Lx(R/p))
comp O(LxR) I/O O(L/p+(L/p)x(R/p))
blocked nested loopsnew resL[]; new resR[]; k=0 for (i=0;i<L.size;i=i++) for (j=0;j<R.size;j++) if L[i]==R[j] resL[k]=i resR[k++]=j
/34CS165, Fall 2016 Stratos Idreos 29
sort
on k
ey
sort
on k
ey
merge
sort merge join
while left and right still have tuples if left.val < right.val left++ else if left.val > right.val right++ else add to result, left++, right++
perfect if data is sorted or needs to be sorted anyway
for the rest of the plan
+ handle duplicates on both sides!
/34CS165, Fall 2016 Stratos Idreos 30
exploit index
new resL[]; new resR[]; k=0 for (i=0;i<L.size;i=i++) jk=R.btree.probe(L[i]) if (jk!=null) resL[k]=i
resR[k++]=jk.pos
even with buffering, we might have to read several (random) pages per probe (to traverse the tree)
L R
/34CS165, Fall 2016 Stratos Idreos 31
outerinner
oraclesearch in O(1)
/34CS165, Fall 2016 Stratos Idreos 32
joins(project=m4)
what to do in m4?nested loops and hash joins cache conscious and multi-core (within reason)
/34CS165, Fall 2016 Stratos Idreos 33
textbook: chapters 4, 14
DATA SYSTEMSprof. Stratos Idreos
class 14
joins