of 32 /32
Discussion #23 1/32 Discussion #23 Relational Algebra

# Discussion #23 1/32 Discussion #23 Relational Algebra

• View
221

0

Embed Size (px)

Citation preview

Discussion #23 1/32

Discussion #23

Relational Algebra

Discussion #23 2/32

Topics• Algebras • Relational Algebra

– use of standard notation– set operators , , – renaming – selection – projection – cross product – join ||

• Queries (from English)• Query optimization• SQL

Discussion #23 3/32

Relational Algebra• What is an algebra?

– a pair: (set of values, set of operations) ADT type Class Object

e.g. stack: (set of all stacks, {pop, push, top, …})

integer: (set of all integers, {+, -, *, })

• What is relational algebra?– (set of relations, set of relational operators)

– {, , , , , , , ||}

Discussion #23 4/32

Relational Algebra is Closed

• Closed: all operations produce values in the value set– (reals, {+, *, }) closed

– (reals, {+, *, , }) not closed (divide by 0)

– (reals, {+, *, >}) not closed (T/F not in value set)

– (computer reals, {+, *, }) not closed (overflow, roundoff)

– (relations, relational operators) closed

• Implication: we can always nest relational operators; can’t for algebras that are not closed.– e.g. after overflow, can do nothing

– e.g. can’t always nest: (2 < 3) + 5 = ?

Discussion #23 5/32

Set Operations: , , and • Relations are sets; thus set operations should work.• Examples:

R = A B 1 2 2 2 2 3

S = A B 2 2 2 3 4 2 5 5

RS = A B 1 2 2 2 2 3 4 2 5 5

RS = A B 2 2 2 3

RS = A B 1 2

SR = A B 4 2 5 5

Discussion #23 6/32

Set Operations (continued …)• Definition: schema(R) = {A, B} = AB, i.e. the

set of attributes• We sometimes write R(AB) to mean the relation

R with schema AB.• Definition: union compatible

– schema(R) = schema(S)– required precondition for , ,

• Definitions: – R S = { t | t R t S}– R S = { t | t R t S}– R S = { t | t R t S}

Discussion #23 7/32

Tuple Restriction: [X]• Restriction is a tuple operator (not a relational

operator).• t[X] restricts tuple t to the attributes in X.

A B Ct = 1 2 3

t[A] = (1) t[AC] = (1,3)

t = (1,2,3)

t[A] = (1,2,3)[A]= {(A,1), (B, 2), (C,3)}[A]= {(A,1)}= (1)

Discussion #23 8/32

Renaming: ABR renames attribute A to be B.

– A must be in schema(R)– B must not be in schema(R)

• Example: let

CBQ = A B 2 2 3 2

RCBQ = A B 1 2 2 2 2 3 3 2

R = A B 1 2 2 2 2 3

Q = A C 2 2 3 2

• But with :

RQ = ?Not union compatible

Discussion #23 9/32

Renaming (continued…)

• Q = ABR renames attribute A to B; the result is Q.• Precondition:

– A schema(R)– B schema(R)

• Postcondition:– schema(Q) = (schema(R) {A}) {B}– Q = {t' | t (tR t' = (t – {(A, t[A])}) {(B, t[A])})}

R = {{(A,1), (C,2)} {(A,2), (C,2)}}

Q = ABR = {{(B,1), (C,2)}

{(B,2), (C,2)}}

Discussion #23 10/32

Selection: • The selection operation selects the tuples that

satisfy a condition.

A=1R = A B 1 2

B=2R = A B 1 2 2 2

A=2B2R = A B 2 2

2 3

A=3R = A B

Note: empty, but still retain the schema

PR = { t | t R P(t) }

• Precondition: each attribute mentioned in P must be in schema(R).

• Postcondition: PR = { t | t R P(t) }

schema(PR) = schema(R)

Meaning: apply predicate P to tuple t by substituting into P appropriate t values.

R = A B 1 2 2 2 2 3

Discussion #23 11/32

Projection: The projection operation restricts tuples in a

relation to those designated in the operation.

R = A B 1 2 2 2 2 3Q = A B C 1 1 1 2 1 1 3 4 5

AR = A 1 2

BR = B 2 3

BCQ = B C 1 1 4 5

ABR = R = A,BR = {A,B}R

Precondition: X schema(R)

Postcondition: XR = { t' | t (t R t' = t[X]) }schema(XR) = X

Discussion #23 12/32

Cross Product: Standard cartesian product adapted for

relational algebra

R = A B 1 2 2 2

S = C D 1 1 2 2 3 3

R S = A B C D 1 2 1 1 1 2 2 2 1 2 3 3 2 2 1 1 2 2 2 2 2 2 3 3

Discussion #23 13/32

Cross Product (continued…)

R = A B 1 2 = t' 2 2

S = C D 1 1 2 2 3 3 = t''

Precondition: schema(R) schema(S) = Postcondition: R S = { t | t' t''(t' R t'' S t = t' t'')}

schema(R S) = schema(R) schema(S)

t' = { (A,1), (B,2) }

t'' = { (C,3), (D,3) }

t' t'' = { (A,1), (B,2), (C,3), (D,3) }

Discussion #23 14/32

Cross Product (continued…)

R = A B 1 2 = t' = { (A,1), (B,2) } 2 2

S = C A 1 1 = t'' = { (C,1), (A,1) } 2 2 3 3 = t''' = { (C,3), (A,3) }

t' t'' = { (A,1), (B,2), (C,1), (A,1) }

What if R and S have the same attribute, e.g. A?

Can’t do cross productSolution: RenameAAS

R AAS = A B C A 1 2 1 1 1 2 2 2 1 2 3 3 2 2 1 1 2 2 2 2 2 2 3 3

Discussion #23 15/32

Natural Join: ||

R = A B 1 2 2 2

S = B C 1 2 2 1 3 2

R || S = A B C 1 2 1

2 2 1

(R )

Cross Product A B

1 2 1 2 1 2 2 2 1 2 2 2 2 1 2 2 3 2

R || S = ABC

Projection

B=B'

Selection Renaming

BB'S

B' C1 22 13 2

1 2

2 2 1

1

Discussion #23 16/32

Join (continued …)

• In general, we can equate 0, 1, 2, or more attributes using || .

• A join is defined as:

schema (R || S) = schema(R) schema(S)

R || S = {t | t[schema(R)] R

t[schema(S)] S}

• There are no preconditions join always works.

Discussion #23 17/32

Join (continued…)

R = A B 1 1 2 3 4 1

S = C D 1 1 1 5

R || S = A B C D 1 1 1 1 1 1 1 5 2 3 1 1 2 3 1 5 4 1 1 1 4 1 1 5

R = A B 1 2 2 2 2 3

S = B C 1 1 2 2 3 3

R || S = A B C 1 2 2 2 2 2 2 3 3

R = A B C 1 2 3 2 2 4 2 3 5

S = A B D 1 1 1 2 2 2 2 2 1

R || S = A B C D

2 2 4 2 2 2 4 1

0 attributes in common (full cross product)

1 attribute in common

2 attributes in common

Discussion #23 18/32

Join (continued…)

• We can use renaming to control the ||

R = A B 1 2 2 2

S = B C 1 2 2 1 3 2

R || CAS = A B 1 2

S' = B A 1 2 2 1 3 2

= A B 2 1 1 2 2 3

R || S' = A B 1 2

• BTW, observe equivalence with intersection

Discussion #23 19/32

Relational Algebra Expressions• Relational operators are closed. Thus we can nest

expressions:

R = A B 1 2 3 4

S = B C D 2 5 1 2 7 2 3 2 3 4 5 4

DC=5(R || S) = A B C D 1 2 5 1 1 2 7 2 3 4 5 4

• Unary operators have precedence over binary operators; binary operators are left associative.

• We can now do something very useful: ask and answer with relational algebra (almost) any query we can dream up.

= D 1 4

Discussion #23 20/32

Relational Algebra Queries• List the prerequisites for EE200.

PrerequisiteCourse='EE200'cp = PrerequisiteEE005CS100

• When does CS101 meet?Day,HourCourse='CS101'cdh = Day Hour

M 9AM W 9AM F 9AM

• When and where does EE200 meet?Day,Hour,RoomCourse='EE200'(cdh || cr) = Day Hour Room

Tu 10AM 25 Ohm Hall W 1PM 25 Ohm Hall Th 10AM 25 Ohm Hall

Our answers are in (cdh || cr).We select Course to be EE200.Then, project on Day, Hour, Room.

Discussion #23 21/32

Queries (continued…)

• Can we rewrite the query more optimally?• What rules should we use?

– Associativity and commutivity of join– Distributive laws for select and project

• What strategy should we use?– Eliminate unnecessary operations– Make joins as small as possible before execution

RoomName='Snoopy' Day='M' Hour='9AM' (snap || csg || cr || cdh)

= Room Turing Aud.

• Where can I find Snoopy at 9 am on Monday?

Discussion #23 22/32

Query Optimization• “Intuitively” we can write

Room(Name='Snoopy'snap || csg || cr || Day='M' Hour='9AM'cdh)

• Why does this execute faster?

• What laws hold that will let us do this?

R || S = S || R

P1P2E = P1P2E

P(R |×| S) = R || PS (if all the attributes of P are in S)

• How do we know they hold?

RoomName='Snoopy' Day='M' Hour='9AM' (snap || csg || cr || cdh)

as

Discussion #23 23/32

Proofs for Laws• To prove P1P2E = P1P2E, we need to prove that two

sets are equal. We prove A = B by showing AB BA. We show that AB by showing that xA xB.

• Thus, we can do two proofs to prove P1P2E = P1P2E as follows:

1. t P1P2E premise2. t E (P1P2)(t) def.: PR = {t | tR P(t)}3. t E P1(t) P2(t) identical substitutions & operations4. t E P2(t) P1(t) commutative

5. t P2E P1(t) def. of 6. t P1P2E def. of

1. t P1P2E premise2. … just go backwards from 6 to 1 in

the proof above

Discussion #23 24/32

Alternate Proof

Thus, we can prove P1P2E = P1P2E as follows:

P1P2E= {t | t E (P1P2)(t)} def.: PR = {t | tR P(t)}= {t | t E P1(t) P2(t)} identical substitutions & operations= {t | t E P2(t) P1(t)} commutative

= {t | t P2E P1(t)} def. of = {t | t P1P2E} def. of = P1P2E def. of a relation

(Derive the right-hand side from the left-hand side.)

Discussion #23 25/32

Proofs for Laws (continued …)• To prove P(R || S) = R || PS, where all attributes of P are

in S, we again need to prove that two sets are equal.• As before, we can convert the lhs to the rhs.

P(R || S) = {t | t P(R || S)} def. of a relation

= {t | t R || S P(t)} def.: PR={t | tRP(t)}= {t | t[schema(R)] R t[schema(S)] S P(t)}

def.: R||S={t | t[schema(R)]Rt[schema(S)]S}= {t | t[schema(R)] R

t[schema(S)] S P(t[schema(S)])} all attributes of P are in S

= {t | t[schema(R)] R t[schema(S)] PS}def. of

= {t | t R || PS} def. of ||

= R || PS def. of a relation

Discussion #23 26/32

SQL

Correspondence with Relational Algebra

select Afrom Rwhere B = 1

Assume we have relations R(AB) and S(BC).

select B from Rexceptselect B from S

select A, R.B, Cfrom R, Swhere R.B = S.B

A B = 1 R

B R B S

A, R.B, C R.B = S.B (R S) = R || S

Discussion #23 27/32

SQL

Correspondence with Relational Algebra

select Afrom Rwhere B = 1

Assume we have relations R(AB) and S(BC).

select R.B from Rwhere R.B not in (select S.B from S)

select *from R natural join S

A B = 1 R

B R B S

R || S

Discussion #23 28/32

SQL Queries• List the prerequisites for EE200.

select Prerequisite Prerequisitefrom cp EE005where Course='EE200' CS100

• When does CS101 meet?select Day, Hour Day Hourfrom cdh M 9AMwhere Course= 'CS101' W 9AM

F 9AM

• When and where does EE200 meet?select cdh.Course, Day, Hour, Room Course Day Hour Roomfrom cdh, cr EE200 Tu 10AM 25 Ohm Hall where cdh.Course= 'EE200' EE200 W 1PM 25 Ohm Hall and cdh.Course=cr.Course EE200 Th 10AM 25 Ohm Hall

Discussion #23 29/32

SQL Queries• List the prerequisites for EE200.

select Prerequisite Prerequisitefrom cp EE005where Course='EE200' CS100

• When does CS101 meet?select Day, Hour Day Hourfrom cdh M 9AMwhere Course= 'CS101' W 9AM

F 9AM

• When and where does EE200 meet?select Course, Day, Hour, Room Course Day Hour Roomfrom cdh natural join cr EE200 Tu 10AM 25 Ohm Hall where cdh.Course= 'EE200' EE200 W 1PM 25 Ohm Hall

EE200 Th 10AM 25 Ohm Hall

Discussion #23 30/32

SQL Queries• List all prerequisite courses.

select Prerequisite Prerequisitefrom cp CS100

EE005CS100CS101CS120CS101CS121CS205

select distinct Prerequisite Prerequisitefrom cp CS100

CS101CS120CS121CS205EE005

Discussion #23 31/32

SQL Queries• Where can I find Snoopy at 9 am on Monday?

• List all prereqs of CS750 (including prereqs of prereqs.)• Not possible with standard SQL (unless nesting depth is

known)

• Is possible with Datalog

Rules: prereqOf(x, y) :- cp(y, x).

prereqOf(x, y) :- prereqOf(x, z), cp(y, z).

Query: prereqOf(x, 'CS750')?

• To gain more power and flexibility, we typically embed SQL in a high-level language.

select Room Roomfrom snap, csg, cr, cdh Turing Aud.where Name='Snoopy' and Day='M' and Hour='9AM' and snap.StudentID=csg.StudentID and csg.Course=cr.Course and cr.Course=cdh.Course

Discussion #23 32/32

SQL Queries

• List all prereqs of CS750 (including prereqs of prereqs.)select cp.Prerequisitefrom cpwhere cp.Course = 'CS750'

union

select cp1.Prerequisitefrom cp cp1, cp cp2where cp1.Course = cp2.Prerequisite and cp2.Course = 'CS750'

union