42

Two mathematical Query Languages form the

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Two mathematical Query Languages form the
Page 2: Two mathematical Query Languages form the

Database Management Systems, R. Ramakrishnan and J. Gehrke

1

Two mathematical Query Languages form the basis for “real” languages (e.g. SQL), and for implementation:

❶ Relational Algebra: More operational, very useful for representing execution plans.

❷ Relational Calculus: Lets users describe what they want, rather than how to compute it. (Non-operational, declarative.)

☛ Understanding Algebra & Calculus is key to☛ understanding SQL, query processing!

Page 3: Two mathematical Query Languages form the

Additional operations:•Intersection ()•Join ( ) •Division ( / )

bid bname color101 Interlake Blue102 Interlake Red103 Clipper Green104 Marine Red

sid bid day22 101 10/10/9658 103 11/12/96

Reserves Sailors Boats

Basic operations:•Selection ( σ ) •Projection ( π ) •Cross-product ( ) •Set-difference ( — ) •Union ( )

:tuples in both relations.:like but only keep tuples where common fields are equal.:tuples from relation 1 with matches in relation 2

: gives a subset of rows.: deletes unwanted columns.: combine two relations.: tuples in relation 1, but not 2 : tuples in relation 1 and 2.

Query Optimizationand Execution

Relational Operators

Files and Access Methods

Buffer Management

Disk Space Management

DB

Prediction: These relational operators are going to look hauntingly familiar when we get to them…!

Page 4: Two mathematical Query Languages form the

Additional operations:•Intersection ()•Join ( ) •Division ( / )

bid bname color101 Interlake Blue102 Interlake Red103 Clipper Green104 Marine Red

sid bid day22 101 10/10/9658 103 11/12/96

Reserves Sailors Boats

Basic operations:•Selection ( σ ) •Projection ( π ) •Cross-product ( ) •Set-difference ( — ) •Union ( )

Find names of sailors who’ve reserved a green boat

σ ( color=‘Green’Boats) ( Sailors)π( sname ) ( Reserves )

Page 5: Two mathematical Query Languages form the

bid bname color101 Interlake Blue102 Interlake Red103 Clipper Green104 Marine Red

sid bid day22 101 10/10/9658 103 11/12/96

Reserves Sailors Boats

Find names of sailors who’ve reserved a green boat

*Given the previous algebra, a query optimizer would replace it with this!

σ ( color=‘Green’Boats)

( Sailors)

π( sname )

( Reserves)π( bid )

π( sid )

Or better yet:

Page 6: Two mathematical Query Languages form the

SELECT S.sid, MIN (R.day)FROM Sailors S, Reserves R, Boats BWHERE S.sid = R.sid AND R.bid = B.bid AND B.color = “red”

AND S.rating > 5GROUP BY S.sidHAVING COUNT (*) >= 2

For each sailor with a rating > 5 that has reserved at least 2 red boats, find the sailor id and the earliest date on which the sailor has a reservation for a red boat.

Page 7: Two mathematical Query Languages form the

HAVING COUNT(*)>2

p S.sid, MIN(R.day)

SELECT S.sid, MIN (R.day)FROM Sailors S, Reserves R, Boats BWHERE S.sid = R.sid AND R.bid = B.bid AND B.color = “red” AND S.rating > 5GROUP BY S.sidHAVING COUNT (*) >= 2

Sailors Reserves Boats

sB.color = “red”

GROUP BY S.Sid

VS.rating > 5

Page 8: Two mathematical Query Languages form the

Allow us to choose different join orders and to `push’ selections and projections ahead of joins.

Selections can be cascaded:

sc1…cn(R) sc1(…(scn(R))…)

SELECT S.sid, MIN (R.day)FROM Sailors S, Reserves R, Boats BWHERE S.sid = R.sid AND R.bid = B.bid AND B.color = “red” AND S.rating > 5GROUP BY S.sidHAVING COUNT (*) >= 2

HAVING COUNT(*)>2

sB.color = “red”

GROUP BY S.Sid

pS.sid, MIN(R.day)

Sailors

Reserves BoatssS.rating > 5

Can apply these predicates separately•Can ‘push’ S.rating > 5 down to Sailors

Page 9: Two mathematical Query Languages form the

Selections can be commuted:

sc1(sc2(R)) sc2(sc1(R))

SELECT S.sid, MIN (R.day)FROM Sailors S, Reserves R, Boats BWHERE S.sid = R.sid AND R.bid = B.bid AND B.color = “red” AND S.rating > 5GROUP BY S.sidHAVING COUNT (*) >= 2

HAVING COUNT(*)>2

sS.Rating > 5

GROUP BY S.Sid

pS.sid, MIN(R.day)

Boats

Reserves SailorssB.color = “red”

Can apply these predicates in different order

Page 10: Two mathematical Query Languages form the

Projections can be cascaded:

pa1(R) pa1(…(pa1, …, an(R))…)

SELECT S.sid, MIN (R.day)FROM Sailors S, Reserves R, Boats BWHERE S.sid = R.sid AND R.bid = B.bid AND B.color = “red” AND S.rating > 5GROUP BY S.sidHAVING COUNT (*) >= 2

HAVING COUNT(*)>2

sB.color = “red”

GROUP BY S.Sid

pS.sid, MIN(R.day)

Reserves Boats

sS.rating > 5

Sailors

pS.sid

Can project S.sid to reduce size of tuples

Page 11: Two mathematical Query Languages form the

Eager projection◦ Can cascade and “push” some

projections thru selection◦ Can cascade and “push” some

projections below one side of a join

◦ Rule of thumb: can project anything not needed “downstream”

HAVING COUNT(*)>2

sB.color = “red”

GROUP BY S.Sid

BoatssS.rating > 5 Sailors

pS.sid

Reservesp

R.sid, R.bid, R.day

SELECT S.sid, MIN (R.day)FROM Sailors S, Reserves R, Boats BWHERE S.sid = R.sid AND R.bid = B.bid AND B.color = “red” AND S.rating > 5GROUP BY S.sidHAVING COUNT (*) >= 2

p??

pB.bid, B.color

p??

pS.sid, R.day

pS.sid, MIN(R.day)

p??p

??

Page 12: Two mathematical Query Languages form the

(R1 R2) R3= R1 (R2 R3)

Page 13: Two mathematical Query Languages form the

A domain is referred to in a relation schema by the domain name and has a set of associated values.◦ Students(sid: string, name: string, login:

string, age: integer, gpa: real) The set of values associated with domain

string is the set of all character strings.

Page 14: Two mathematical Query Languages form the

The most widely used relational query language. Standardized

(although most systems add their own “special sauce” -- including PostgreSQL)

We will study SQL92 -- a basic subset

Page 15: Two mathematical Query Languages form the

Two sublanguages:◦ DDL – Data Definition Language Define and modify schema (at all 3 levels)

◦ DML – Data Manipulation Language Queries and IUD (insert update delete)

DBMS is responsible for efficient evaluation.◦ Relational completeness means we can define

precise semantics for relational queries.◦ Optimizer can re-order operations, without

affecting query answer.◦ Choices driven by “cost model”

Page 16: Two mathematical Query Languages form the

sid sname rating age1 Frodo 7 222 Bilbo 2 393 Sam 8 27

Sailors

sid bid day1 102 9/122 102 9/13

Reserves

bid bname color101 Nina red102 Pinta blue103 Santa Maria red

Boats

CREATE TABLE Sailors (sid INTEGER, sname CHAR(20), rating INTEGER, age REAL, PRIMARY KEY sid)

CREATE TABLE Boats (bid INTEGER, bname CHAR (20), color CHAR(10) PRIMARY KEY bid)

CREATE TABLE Reserves (sid INTEGER, bid INTEGER, day DATE, PRIMARY KEY (sid, bid, day), FOREIGN KEY sid REFERENCES Sailors, FOREIGN KEY bid REFERENCES Boats)

NOT NULL,

NOT NULL,

NOT NULL,NOT NULL,

NOT NULL,

Page 17: Two mathematical Query Languages form the

A foreign key constraint is an Integrity Constraint: ◦ a condition that must be true for any instance of the database; ◦ Specified when schema is defined.◦ Checked when relations are modified.

Primary/foreign key constraints; but databases support more general constraints as well.◦ e.g. domain constraints like: Rating must be between 1 and 10

ALTER TABLE SAILORS ADD CONSTRAINT RATING CHECK (RATING >= 1 AND RATING < 10)

Or even more complex (and potentially nonsensical):ALTER TABLE SAILORS ADD CONSTRAINT RATING CHECK (RATING*AGE/4 <= SID)

Page 18: Two mathematical Query Languages form the

Specify them on CREATE or ALTER TABLE statements

Column Constraints:expressions for column constraint must produce boolean results and

reference the related column’s value only.

NOT NULL | NULL | UNIQUE | PRIMARY KEY | CHECK (expression)

FOREIGN KEY (column) referenced_table [ ON DELETE action ] [ ON UPDATE action ] } action is one of:

NO ACTION, CASCADE, SET NULL, SET DEFAULT

Page 19: Two mathematical Query Languages form the

Table Constraints:UNIQUE ( column_name [, ... ] )PRIMARY KEY ( column_name [, ... ] ) | CHECK ( expression ) | FOREIGN KEY ( column_name [, ... ] ) REFERENCES reftable [ ON DELETE action ] [ ON UPDATE action ] }

Here, expressions, keys, etc can include multiple columns

DBMSs have fairly sophisticated support for constraints!

Page 20: Two mathematical Query Languages form the

…but they have drawbacks:◦ Expensive◦ Can’t always return a meaningful error back to the

application. e.g: What if you saw this error when you enrolled in a course online?

“A violation of the constraint imposed by a unique index or a unique constraint occurred”.

◦ Can be inconvenient e.g. What if the ‘Sailing Class’ application wants to register new (unrated)

sailors with rating 0?

So they aren’t widely used◦ Software developers often prefer to keep the

integrity logic in applications instead

Page 21: Two mathematical Query Languages form the

DML includes 4 main statements:SELECT (query), INSERT, UPDATE and DELETE

e.g: To find the names of all 19 year old students:

SELECT S.nameFROM Students SWHERE S.age=19

sid name age gpa

53666 Jones 18 3.4 53688

Smith

18

3.2

53650 Smith

login

jones@cs smith@ee

smith@math 19 3.8

We’ll spend a lot of time on this one

SELECT

PROJECT

Page 22: Two mathematical Query Languages form the

Can specify a join over two tables as follows:

SELECT S.name, E.cid FROM Students S, Enrolled E WHERE S.sid=E.sid AND E.grade=‘B'

result =

sid cid grade53831 Carnatic101 C53831 Reggae203 B53650 Topology112 A53666 History105 B

S.name E.cid Jones History105

SELECT

JOIN

PROJECT

Page 23: Two mathematical Query Languages form the

SELECT [DISTINCT] target-listFROM relation-listWHERE qualification

relation-list : A list of relation names, possibly with a range-variable after each name

target-list : A list of attributes of tables in relation-list

DISTINCT: optional keyword indicating answer should not contain duplicates.

In SQL, default is that duplicates are not eliminated! (Result is called a “multiset”)

qualification : Comparisons combined using AND, OR and NOT. Comparisons are Attr op const or Attr1 op Attr2, where op is one of ,,,, etc.

Page 24: Two mathematical Query Languages form the

Semantics of an SQL query are defined in terms of the following conceptual evaluation strategy:1. FROM clause: compute cross-product of all tables2. WHERE clause: Check conditions, discard tuples that fail.

(called “selection”).3. SELECT clause: Delete unwanted fields. (called

“projection”).4. If DISTINCT specified, eliminate duplicate rows.

Probably the least efficient way to compute a query! ◦ An optimizer will find more efficient strategies to get the

same answer.

Page 25: Two mathematical Query Languages form the

sid sname rating age1 Frodo 7 222 Bilbo 2 393 Sam 8 27

Sailors

sid bid day1 102 9/122 103 9/13

Reserves

bid bname color101 Nina red102 Pinta blue103 Santa Maria red

Boats

SELECT snameFROM Sailors, Reserves WHERE Sailors.sid=Reserves.sid AND bid=103

X

Page 26: Two mathematical Query Languages form the

sid sname rating age sid bid day1 Frodo 7 22 1 102 9/121 Frodo 7 22 2 103 9/132 Bilbo 2 39 1 102 9/122 Bilbo 2 39 2 103 9/133 Sam 8 27 1 103 9/123 Sam 8 27 2 103 9/13

sid sname rating age1 Frodo 7 222 Bilbo 2 393 Sam 8 27

Sailorssid bid day1 102 9/122 103 9/13

Reserves

SailorsXReserves...

Page 27: Two mathematical Query Languages form the

sid sname rating age1 Frodo 7 222 Bilbo 2 393 Sam 8 27

Sailorssid bid day1 102 9/122 103 9/13

Reserves

Question: If |S| is cardinality of Sailors, and |R| is cardinality of Reserves,What is the cardinality of Sailors X Reserves?

Answer: |S| * |R| |Sailors X Reserves| = 3X2 = 6

Page 28: Two mathematical Query Languages form the

sid sname rating age sid bid day1 Frodo 7 22 1 102 9/121 Frodo 7 22 2 103 9/132 Bilbo 2 39 1 102 9/122 Bilbo 2 39 2 103 9/133 Sam 8 27 1 102 9/123 Sam 8 27 2 103 9/13

SailorsXReserves

SELECT snameFROM Sailors, Reserves WHERE Sailors.sid=Reserves.sid AND bid=103

Page 29: Two mathematical Query Languages form the

sid sname rating age sid bid day1 Frodo 7 22 1 102 9/121 Frodo 7 22 2 103 9/132 Bilbo 2 39 1 102 9/122 Bilbo 2 39 2 103 9/133 Sam 8 27 1 102 9/123 Sam 8 27 2 103 9/13

SailorsXReserves

SELECT snameFROM Sailors, Reserves WHERE Sailors.sid=Reserves.sid AND bid=103

Page 30: Two mathematical Query Languages form the

•Used for short hand•Needed when ambiguity could arise

e.g two tables with the same column name:

SELECT snameFROM Sailors, Reserves WHERE Sailors.sid=Reserves.sid AND Reserves.bid=103

SELECT snameFROM Sailors S, Reserves RWHERE S.sid=R.sid AND R.bid=103

Question: do range variables remind you of anything?

ØVariables in relational calculus

Page 31: Two mathematical Query Languages form the

e.g a Self-join:SELECT R1.bid, R1.dateFROM Reserves R1, Reserves R2WHERE R1.bid = R2.bid and R1.date = R2.date and R1.sid != R2.sid

sid bid day1 102 9/123 103 9/124 103 9/132 103 9/12

R1 R2sid bid day

1 102 9/123 103 9/124 103 9/132 103 9/12

Reserves Reserves

R2R2R2

R1

R1

bid day

103 9/12

bid day

103 9/12

Page 32: Two mathematical Query Languages form the

SELECT R1.bid, R1.dayFROM Reserves R1, Reserves R2WHERE R1.bid = R2.bid and R1.day = R2.day and R1.sid != R2.sid

bid day

103 9/12

bid day

103 9/12

What are we computing?

Boats reserved on the same dayby different sailors

Page 33: Two mathematical Query Languages form the

Can use arithmetic expressions (add other operations we’ll discuss later)

SELECT S.age, S.age-5 AS age1, 2*S.age AS age2 FROM Sailors SWHERE S.sname = ‘Dustin’

SELECT S1.sname AS name1, S2.sname AS name2 FROM Sailors S1, Sailors S2WHERE 2*S1.rating = S2.rating - 1

• Can use AS to provide column names

• Can use “*” if you want all columns:SELECT *FROM Sailors xWHERE x.age > 20

Page 34: Two mathematical Query Languages form the

`_’ stands for any one character and `%’ stands for 0 or more arbitrary characters.

• Can also have expressions in WHERE clause:SELECT S1.sname AS name1, S2.sname AS name2 FROM Sailors S1, Sailors S2WHERE 2*S1.rating = S2.rating - 1

SELECT S.age, S.age-5 AS age1, 2*S.age AS age2 FROM Sailors SWHERE S.sname LIKE ‘B_l%o’

•“LIKE” is used for string matching.

Page 35: Two mathematical Query Languages form the

Find sailors that have reserved at least one boat

SELECT DISTINCT S.sid FROM Sailors S, Reserves RWHERE S.sid=R.sid

sid sname rating age1 Frodo 7 222 Bilbo 2 393 Sam 8 27

Sailorssid bid day1 102 9/122 103 9/122 102 9/13

Reserves

sid

12

Page 36: Two mathematical Query Languages form the

How about:

SELECT S.sid FROM Sailors S, Reserves RWHERE S.sid=R.sid

sid122

Page 37: Two mathematical Query Languages form the

How about:

SELECT S.snameFROM Sailors S, Reserves RWHERE S.sid=R.sid

sid sname rating age1 Frodo 7 222 Bilbo 2 39

3 Sam 8 274 Bilbo 5 32

Sailors

sid bid day1 102 9/122 103 9/134 105 9/13

Reserves

snameFrodoBilboBilbo

SELECT DISTINCT S.snameFROM Sailors S, Reserves RWHERE S.sid=R.sid

snameFrodoBilbo

vs:

Do we find all sailors that reserved at least one boat?

Page 38: Two mathematical Query Languages form the

ANDs, ORs, UNIONs and INTERSECTs

sid sname rating age1 Frodo 7 222 Bilbo 2 393 Sam 8 27

SailorsReserves

bid bname color101 Nina red102 Pinta blue103 Santa Maria red105 Titanic green

Boats

sid bid day1 102 9/122 103 9/134 105 9/13

Xsid24

SELECT R.sidFROM Boats B,Reserves RWHERE(B.color=‘red’ OR B.color=‘green’)

AND R.bid=B.bid

Page 39: Two mathematical Query Languages form the

SELECT R.sidFROM Boats B,Reserves RWHERE(B.color=‘red’ AND B.color=‘green’)

AND R.bid=B.bid

ANDs and ORs

sid sname rating age1 Frodo 7 222 Bilbo 2 393 Sam 8 27

SailorsReservessid bid day1 101 9/122 103 9/131 105 9/13

Xbid bname color101 Nina red102 Pinta blue103 Santa Maria red105 Titanic green

Boats

Page 40: Two mathematical Query Languages form the

SELECT R.sidFROM Boats B,Reserves RWHERE B.color = ‘red’

AND R.bid=B.bid

INTERSECT

SELECT R.sidFROM Boats B,Reserves RWHERE B.color = ‘green’

AND R.bid=B.bid

Use INTERSECT instead of AND

Reservessid bid day1 101 9/122 103 9/131 105 9/13

bid bname color101 Nina red102 Pinta blue103 Santa Maria red105 Titanic green

Boats

sid12

sid1 =

sid1

Exercise: try to rewrite this query using a self join instead of INTERSECT!

Page 41: Two mathematical Query Languages form the

Could also use UNION for the OR query

Reservessid bid day1 102 9/122 103 9/134 105 9/13

bid bname color101 Nina red102 Pinta blue103 Santa Maria red105 Titanic green

Boats

sid2

sid4

=sid24

SELECT R.sidFROM Boats B, Reserves RWHERE B.color = ‘red’ AND R.bid=B.bid UNION

SELECT R.sidFROM Boats B, Reserves RWHERE B.color = ‘green’ AND R.bid=B.bid

Page 42: Two mathematical Query Languages form the

SELECT S.sid FROM Sailors SEXCEPTSELECT S.sid FROM Sailors S, Reserves RWHERE S.sid=R.sid

Find sids of sailors who have not reserved a boat

sid sname rating age1 Frodo 7 222 Bilbo 2 393 Sam 8 27

Reservessid bid day1 102 9/122 103 9/131 105 9/13

Sailors

First find the set of sailors who have reserved a boat…and then compare it with the rest of the sailors

sid3