SQL - Joinup3f/cs4750/slides/4750meet13-14-SQL-join.pdf(INNER) JOIN left table right table (INNER)...

Preview:

Citation preview

Fall 2020 – University of Virginia 1© Praphamontripong© Praphamontripong

SQL - Join

CS 4750Database Systems

[A. Silberschatz, H. F. Korth, S. Sudarshan, Database System Concepts, Ch.4.1][ https://www.dofactory.com/sql/join ]

Fall 2020 – University of Virginia 2© Praphamontripong

Recap 1: Aggregation

ID name dept_name salary10101 Srinivasan Comp. Sci. 65000.0012121 Wu Finance 90000.0015151 Mozart Music 40000.0022222 Einstein Physics 95000.0032343 El Said History 60000.0033456 Gold Physics 87000.0045565 Kate Comp. Sci. 75000.0058583 Katz History 62000.0076543 Singh Finance 80000.0076766 Brandt Biology 72000.0083821 Kate Comp. Sci. 92000.0098345 Kim Elec. Eng. 80000.00

Instructor (refer to alldbs.sql)

Find average salary for each department

SELECT dept_name, AVG(salary)FROM instructorGROUP BY dept_name;

Fall 2020 – University of Virginia 3© Praphamontripong

Recap 2: Aggregation

ID name dept_name salary10101 Srinivasan Comp. Sci. 65000.0012121 Wu Finance 90000.0015151 Mozart Music 40000.0022222 Einstein Physics 95000.0032343 El Said History 60000.0033456 Gold Physics 87000.0045565 Kate Comp. Sci. 75000.0058583 Katz History 62000.0076543 Singh Finance 80000.0076766 Brandt Biology 72000.0083821 Kate Comp. Sci. 92000.0098345 Kim Elec. Eng. 80000.00

Instructor (refer to alldbs.sql)

Find average salary for each department that is greater than 70000

SELECT dept_name, AVG(salary)FROM instructorGROUP BY dept_nameHAVING AVG(salary) > 70000;

Fall 2020 – University of Virginia 4© Praphamontripong

Recap: Primary Keys, Foreign KeysHiring (TA_id, Name, Year, Two_week_hours)TA_id Name Year Two_week_hours1234 Minnie 4 202345 Humpty 3 243456 Dumpty 4 303333 Minnie 3 125678 Mickey 2 16

Key = one or more attributes that uniquely

identify a rowTA_Assignment (TA_id, Course)TA_id Course1234 Database Systems2345 Database Systems3456 Cloud Computing1234 Software Analysis5678 Web Programming Lang.

Foreign key = one or more attributes that uniquely identify a row in another

table

references

Foreign key describes a relationship between tables

Fall 2020 – University of Virginia 5© Praphamontripong

JOIN• Combine related tables or sets of data on key values

• All tables (or relations) must have a unique identifier (primary key)

• Any related tables must contain a copy of the unique identifier (foreign key) from the related table

lefttable

righttable

FULL JOIN

lefttable

righttable

LEFT JOIN

lefttable

righttable

RIGHT JOIN

lefttable

righttable

(INNER) JOIN

lefttable

righttable

NATURAL JOIN

(preserve all rows, uncommon, thus skip)

(most common)

(preserve all rowsin right table)

(preserve all rowsin left table)

Fall 2020 – University of Virginia 6© Praphamontripong

NATURAL JOIN

lefttable

righttable

NATURAL JOIN

p1 colA colB1 A aa2 B bb3 C cc4 D dd5 E ee7 G gg8 H hh

p2 p1 colX colY9001 1 55 ABC9002 3 77 PQR9003 3 95 DEF9004 7 62 GHI9005 5 50 KLM9006 2 42 STU9007 7 83 XYZ

Not include the repeated columnone(p1, colA, colB) two(p2, p1, colX, colY)

SELECT * FROM one NATURAL JOIN two;

for each row1 in one:for each row2 in two:

if (row1.p1 = row2.p1):output (row1.p1, row1.colA, row1.colB,

row2.p2, row2.colX, row2.colY)

Join on common named

attribute(s). If no match

found, move on.

Fall 2020 – University of Virginia 7© Praphamontripong

p1 colA colB p2 colX colY1 A aa 9001 55 ABC2 B bb 9006 42 STU3 C cc 9002 77 PQR3 C cc 9003 95 DEF5 E ee 9005 50 KLM7 G gg 9004 62 GHI7 G gg 9007 83 XYZ

NATURAL JOIN

lefttable

righttable

NATURAL JOINNot include the repeated column

SELECT * FROM one NATURAL JOIN two

p1 colA colB1 A aa2 B bb3 C cc4 D dd5 E ee7 G gg8 H hh

p2 p1 colX colY9001 1 55 ABC9002 3 77 PQR9003 3 95 DEF9004 7 62 GHI9005 5 50 KLM9006 2 42 STU9007 7 83 XYZ

one(p1, colA, colB) two(p2, p1, colX, colY)

Note the order of the columns; order of rows does not matter

Fall 2020 – University of Virginia 8© Praphamontripong

(INNER) JOIN

lefttable

righttable

(INNER) JOIN

p1 colA colB1 A aa2 B bb3 C cc4 D dd5 E ee7 G gg8 H hh

p2 p1 colX colY9001 1 55 ABC9002 3 77 PQR9003 3 95 DEF9004 7 62 GHI9005 5 50 KLM9006 2 42 STU9007 7 83 XYZ

Include the repeated columnone(p1, colA, colB) two(p2, p1, colX, colY)

SELECT * FROM one JOIN two ON one.p1 = two.p1;

Join on specified attribute(s). If no match

found, move on.

for each row1 in one:for each row2 in two:

if (row1.p1 = row2.p1):output (row1.p1, row1.colA, row1.colB,

row2.p2, row2.p1, row2.colX, row2.colY)

Fall 2020 – University of Virginia 9© Praphamontripong

p1 colA colB p2 p1 colX colY1 A aa 9001 1 55 ABC2 B bb 9006 2 42 STU3 C cc 9002 3 77 PQR3 C cc 9003 3 95 DEF5 E ee 9005 5 50 KLM7 G gg 9004 7 62 GHI7 G gg 9007 7 83 XYZ

(INNER) JOIN

lefttable

righttable

(INNER) JOIN

SELECT * FROM one JOIN twoON one.p1 = two.p1

p1 colA colB1 A aa2 B bb3 C cc4 D dd5 E ee7 G gg8 H hh

p2 p1 colX colY9001 1 55 ABC9002 3 77 PQR9003 3 95 DEF9004 7 62 GHI9005 5 50 KLM9006 2 42 STU9007 7 83 XYZ

one(p1, colA, colB) two(p2, p1, colX, colY)

SELECT * FROM one INNER JOIN twoON one.p1 = two.p1

Include the repeated column

Note the order of the columns; order of rows does not matter

Fall 2020 – University of Virginia 10© Praphamontripong

Cartesian Productp1 colA colB1 A aa2 B bb3 C cc4 D dd5 E ee7 G gg8 H hh

p2 p1 colX colY9001 1 55 ABC9002 3 77 PQR9003 3 95 DEF9004 7 62 GHI9005 5 50 KLM9006 2 42 STU9007 7 83 XYZ

one(p1, colA, colB) two(p2, p1, colX, colY)

SELECT * FROM one JOIN two;

for each row1 in one:for each row2 in two:

output (row1.p1, row1.colA, row1.colB,row2.p2, row2.p1, row2.colX, row2.colY)

SELECT * FROM one, two;

Fall 2020 – University of Virginia 11© Praphamontripong

p1 colA colB p2 p1 colX colY1 A aa 9001 1 55 ABC1 A aa 9005 5 50 KLM1 A aa 9004 7 62 GHI1 A aa 9003 3 95 DEF1 A aa 9007 7 83 XYZ1 A aa 9002 3 77 PQR1 A aa 9006 2 42 STU2 B bb 9002 3 77 PQR2 B bb 9006 2 42 STU2 B bb 9001 1 55 ABC2 B bb 9005 5 50 KLM2 B bb 9004 7 62 GHI2 B bb 9003 3 95 DEF2 B bb 9007 7 83 XYZ3 C cc 9003 3 95 DEF3 C cc 9007 7 83 XYZ... … … ... … … …

Cartesian Product

p1 colA colB1 A aa2 B bb3 C cc4 D dd5 E ee7 G gg8 H hh

p2 p1 colX colY9001 1 55 ABC9002 3 77 PQR9003 3 95 DEF9004 7 62 GHI9005 5 50 KLM9006 2 42 STU9007 7 83 XYZ

one(p1, colA, colB)

two(p2, p1, colX, colY)

one × two

Note the order of the columns; order of rows does not matter

Fall 2020 – University of Virginia 12© Praphamontripong

NATURAL LEFT OUTTER JOINAugment a join by the dangling tuples

p1 colA colB1 A aa2 B bb3 C cc4 D dd5 E ee7 G gg8 H hh

p2 p1 colX colY9001 1 55 ABC9002 3 77 PQR9003 3 95 DEF9004 7 62 GHI9005 5 50 KLM9006 2 42 STU9007 7 83 XYZ

one(p1, colA, colB) two(p2, p1, colX, colY)

SELECT * FROM one NATURAL LEFT OUTTER JOIN two;

Join on common named

attribute(s). If no match

found, pad with NULL values

lefttable

righttable

LEFT JOIN

Preserve all rows in left table

Fall 2020 – University of Virginia 13© Praphamontripong

p1 colA colB p2 colX colY1 A aa 9001 55 ABC2 B bb 9006 42 STU3 C cc 9002 77 PQR3 C cc 9003 95 DEF4 D dd NULL NULL NULL5 E ee 9005 50 KLM7 G gg 9004 62 GHI7 G gg 9007 83 XYZ8 H hh NULL NULL NULL

p1 colA colB1 A aa2 B bb3 C cc4 D dd5 E ee7 G gg8 H hh

p2 p1 colX colY9001 1 55 ABC9002 3 77 PQR9003 3 95 DEF9004 7 62 GHI9005 5 50 KLM9006 2 42 STU9007 7 83 XYZ

one(p1, colA, colB) two(p2, p1, colX, colY)

Note the order of the columns; order of rows does not matter

NATURAL LEFT OUTTER JOIN

lefttable

righttable

LEFT JOINAugment a join by the dangling tuples

Fall 2020 – University of Virginia 14© Praphamontripong

LEFT OUTTER JOINAugment a join by the dangling tuples

p1 colA colB1 A aa2 B bb3 C cc4 D dd5 E ee7 G gg8 H hh

p2 p1 colX colY9001 1 55 ABC9002 3 77 PQR9003 3 95 DEF9004 7 62 GHI9005 5 50 KLM9006 2 42 STU9007 7 83 XYZ

one(p1, colA, colB) two(p2, p1, colX, colY)

SELECT * FROM one LEFT OUTTER JOIN twoON one.p1 = two.p1;

Join on specified attribute(s). If no match

found, pad with NULL values

lefttable

righttable

LEFT JOIN

Preserve all rows in left table

Fall 2020 – University of Virginia 15© Praphamontripong

p1 colA colB p2 p1 colX colY1 A aa 9001 1 55 ABC2 B bb 9006 2 42 STU3 C cc 9002 3 77 PQR3 C cc 9003 3 95 DEF4 D dd NULL NULL NULL NULL5 E ee 9005 5 50 KLM7 G gg 9004 7 62 GHI7 G gg 9007 7 83 XYZ8 H hh NULL NULL NULL NULL

p1 colA colB1 A aa2 B bb3 C cc4 D dd5 E ee7 G gg8 H hh

p2 p1 colX colY9001 1 55 ABC9002 3 77 PQR9003 3 95 DEF9004 7 62 GHI9005 5 50 KLM9006 2 42 STU9007 7 83 XYZ

one(p1, colA, colB) two(p2, p1, colX, colY)

Note the order of the columns; order of rows does not matter

LEFT OUTTER JOIN

lefttable

righttable

LEFT JOINAugment a join by the dangling tuples

Fall 2020 – University of Virginia 16© Praphamontripong

NATURAL RIGHT OUTTER JOINAugment a join by the dangling tuples

p1 colA colB1 A aa2 B bb3 C cc4 D dd5 E ee7 G gg8 H hh

p2 p1 colX colY9001 1 55 ABC9002 3 77 PQR9003 3 95 DEF9004 7 62 GHI9005 5 50 KLM9006 2 42 STU9007 7 83 XYZ

one(p1, colA, colB) two(p2, p1, colX, colY)

SELECT * FROM one NATURAL RIGHT OUTTER JOIN two;

Join on common named

attribute(s). If no match

found, pad with NULL values

lefttable

righttable

RIGHT JOIN

Preserve all rows in right table

Fall 2020 – University of Virginia 17© Praphamontripong

p1 colA colB p2 colX colY1 A aa 9001 55 ABC2 B bb 9006 42 STU3 C cc 9002 77 PQR3 C cc 9003 95 DEF5 E ee 9005 50 KLM7 G gg 9004 62 GHI7 G gg 9007 83 XYZ

p1 colA colB1 A aa2 B bb3 C cc4 D dd5 E ee7 G gg8 H hh

p2 p1 colX colY9001 1 55 ABC9002 3 77 PQR9003 3 95 DEF9004 7 62 GHI9005 5 50 KLM9006 2 42 STU9007 7 83 XYZ

one(p1, colA, colB) two(p2, p1, colX, colY)

Note the order of the columns; order of rows does not matter

NATURAL RIGHT OUTTER JOINAugment a join by the dangling tuples

Since two.p1 is a foreign key referencing one.p1, matches are always founded

lefttable

righttable

RIGHT JOIN

Fall 2020 – University of Virginia 18© Praphamontripong

RIGHT OUTTER JOINAugment a join by the dangling tuples

p1 colA colB1 A aa2 B bb3 C cc4 D dd5 E ee7 G gg8 H hh

p2 p1 colX colY9001 1 55 ABC9002 3 77 PQR9003 3 95 DEF9004 7 62 GHI9005 5 50 KLM9006 2 42 STU9007 7 83 XYZ

one(p1, colA, colB) two(p2, p1, colX, colY)

Join on specified attribute(s). If no match

found, pad with NULL values

lefttable

righttable

RIGHT JOIN

SELECT * FROM one RIGHT OUTTER JOIN twoON one.p1 = two.p1;

Preserve all rows in right table

Fall 2020 – University of Virginia 19© Praphamontripong

p1 colA colB1 A aa2 B bb3 C cc4 D dd5 E ee7 G gg8 H hh

p2 p1 colX colY9001 1 55 ABC9002 3 77 PQR9003 3 95 DEF9004 7 62 GHI9005 5 50 KLM9006 2 42 STU9007 7 83 XYZ

one(p1, colA, colB) two(p2, p1, colX, colY)

Note the order of the columns; order of rows does not matter

RIGHT OUTTER JOINAugment a join by the dangling tuples

Since two.p1 is a foreign key referencing one.p1, matches are always founded

lefttable

righttable

RIGHT JOIN

p1 colA colB p2 p1 colX colY1 A aa 9001 1 55 ABC2 B bb 9006 2 42 STU3 C cc 9002 3 77 PQR3 C cc 9003 3 95 DEF5 E ee 9005 5 50 KLM7 G gg 9004 7 62 GHI7 G gg 9007 7 83 XYZ

Fall 2020 – University of Virginia 20© Praphamontripong

Self Joins• Join with itself

• Useful when the table has a FOREIGN KEY which references its own PRIMARY KEY

• Can be viewed as joining two copies of the same table

• To do self join, use aliases

Fall 2020 – University of Virginia 21© Praphamontripong

(Challenging) Example: Self JoinsFind names of all TAs who are assigned to Database Systems and Software Analysis

TA_id Name Year Two_week_hours1234 Minnie 4 202345 Humpty 3 243456 Dumpty 4 303333 Minnie 3 125678 Mickey 2 16

TA_id Course1234 Database Systems2345 Database Systems3456 Cloud Computing1234 Software Analysis5678 Web Programming Lang.2345 Software Analysis

Hiring(TA_id, Name, Year, Two_week_hours) TA_Assignment(TA_id, Course)

How should we write SQL?

Fall 2020 – University of Virginia 22© Praphamontripong

(Challenging) Example: Self JoinsFind names of all TAs who are assigned to Database Systems and Software Analysis

TA_id Name Year Two_week_hours1234 Minnie 4 202345 Humpty 3 243456 Dumpty 4 303333 Minnie 3 125678 Mickey 2 16

TA_id Course1234 Database Systems2345 Database Systems3456 Cloud Computing1234 Software Analysis5678 Web Programming Lang.2345 Software Analysis

Hiring(TA_id, Name, Year, Two_week_hours) TA_Assignment(TA_id, Course)

SELECT H.NameFrom Hiring AS H, TA_Assignment AS TWHERE H.TA_id = T.TA_id AND

T.Course = "Database Systems" AND

T.Course = "Software Analysis";

Will this work?

No.Returns empty set

Fall 2020 – University of Virginia 23© Praphamontripong

(Challenging) Example: Self JoinsFind names of all TAs who are assigned to Database Systems and Software Analysis

TA_id Name Year Two_week_hours1234 Minnie 4 202345 Humpty 3 243456 Dumpty 4 303333 Minnie 3 125678 Mickey 2 16

TA_id Course1234 Database Systems2345 Database Systems3456 Cloud Computing1234 Software Analysis5678 Web Programming Lang.2345 Software Analysis

Hiring(TA_id, Name, Year, Two_week_hours) TA_Assignment(TA_id, Course)

SELECT H.NameFrom Hiring AS H, TA_Assignment AS T1,

TA_Assignment AS T2WHERE H.TA_id = T1.TA_id AND

H.TA_id = T2.TA_id ANDT1.Course = "Database Systems" ANDT2.Course = "Software Analysis";

Will this work?

NameMinnie

Humpty

Fall 2020 – University of Virginia 24© Praphamontripong

(Challenging) Example: Self JoinsFind names of all TAs who are assigned to Database Systems and Software Analysis

TA_id Name Year Two_week_hours1234 Minnie 4 202345 Humpty 3 243456 Dumpty 4 303333 Minnie 3 125678 Mickey 2 16

TA_id Course1234 Database Systems2345 Database Systems3456 Cloud Computing1234 Software Analysis5678 Web Programming Lang.2345 Software Analysis

Hiring(TA_id, Name, Year, Two_week_hours) TA_Assignment(TA_id, Course)

SELECT H.NameFrom Hiring AS H, TA_Assignment AS T1,

TA_Assignment AS T2WHERE H.TA_id = T1.TA_id AND

H.TA_id = T2.TA_id ANDT1.Course = "Database Systems" ANDT2.Course = "Software Analysis";

Let’s consider each part of the query

All pairs of courses a TA is assigned

For each pair, check for courses

Fall 2020 – University of Virginia 25© Praphamontripong

Wrap-UpJoin combines data across tables

• Nested-loop• Natural join (most common)• Inner join (filter Cartesian product)• Outer joins (preserve non-matching tuples)• Self join pattern

Different joining techniques can be used to achieve particular goals

What’s next? • SQL – subqueries, triggers

lefttable

righttable

LEFT JOIN

lefttable

righttable

RIGHT JOIN

lefttable

righttable

NATURAL JOIN

lefttable

righttable

(INNER) JOIN

Recommended