Upload
buique
View
216
Download
1
Embed Size (px)
Citation preview
Databases 2012
The Relational Algebra
Christian S. Jensen
Computer Science, Aarhus University
2 The Relational Algebra
What is an Algebra?
An algebra consists of
• values
• operators
• rules
Closure: operations yield values
Examples
• integers with +, ,
• sets with , , \,
• matrices with +, ,
• functions with , O , -1
• relations with query operators
3 The Relational Algebra
Mathematical Relations
An n-ary relation on a set S is a subset of Sn
Examples
• is a binary relation on R, a subset of RR
{ (1.2, 3.4), (34, 117.363), (53, 0.1234), ... }
• divides is a binary relation on N, a subset of NN
{ (2, 4), (3, 9), (3, 12), (17, 34), (1237, 21029), ... }
• negative is a binary relation on N, a subset of NN
{ (3,-3), (-17,17), (0,0), (2, -2), (-2,2), (87, -87), ...}
• sum is a ternary relation on N, a subset of NNN
{ (3,5,8), (23,14,37), (0,123,123), (42,87,129), ... }
• married to is a binary relation on people
{ (Hillary, Bill), (Bill, Hillary), (Angelina, Brad), ... }
4 The Relational Algebra
Tables as Relations
A database relation on a data set D consists of
• a schema of attribute names (a1, a2, ..., an)
• a finite n-ary relation on D, a subset of Dn
A relation is like a table where
• all columns have the same generic type
• no duplicates are allowed
• no other constraints are imposed
We implicitly allow permutations of the attributes
Database relations form an algebra with the operators
• union:
• intersection:
• difference: \
• projection:
• renaming:
• selection:
• Cartesian product:
• natural join: ⋈
These provide an abstract model of database queries
5 The Relational Algebra
Relational Operators
6 The Relational Algebra
Union, Intersection, Difference
The arguments must have the same schema
The result has again that schema
R S
R S
R \ S
They compute the set operations on the relations
7 The Relational Algebra
(R)
Assume the schema of R is (a1,...,an,b1,...,bm)
The schema of the result is (a1,...,an)
The result relation is
{ (d1, ..., dn) | (d1, ..., dn+m) R }
Projection
a1,...,an
8 The Relational Algebra
Renaming
ab(R)
The name a must occur as ai in the schema of R
The name b must not occur in the schema of R
Schema of the result: (a1, ..., ai-1, b, ai+1, ..., an)
The result relation is unchanged
ab,cd,ef(R) = ab(cd(ef(R)))
9 The Relational Algebra
Selection
C(R)
C is a condition of the attributes of R
The resulting schema is unchanged
The relation part is: { r | r R C(r) }
10 The Relational Algebra
Cartesian Product
R S
Assume
• R has schema (a1, ..., am)
• S has schema (b1, ..., bn)
The new schema is (a1, ..., am, b1, ..., bn)
The relation part is
{ (c1, ..., cm+n) | (c1, ..., cm) R (cm+1, ..., cm+n) S }
11 The Relational Algebra
Natural Join
R ⋈ S
Assume
• R has schema (a1, ..., ak, c1, ..., cn)
• S has schema (c1, ..., cn, b1, ..., bm)
• {ai} {bi} =
The new schema is (a1, ..., ak, c1, ..., cn, b1, ..., bm)
The relation part is
{ (d1, ..., dk, e1, ..., en, f1, ..., fm) |
(d1, ..., dk, e1, ..., en) R (e1, ..., en, f1, ..., fm) S }
R ⋈ S = R ∩ S = R ∖ (R ∖ S)
• when the schemas are identical
R ⋈ S = R ⨉ S
• when the schemas are disjoint
R ⋈𝚹 S = 𝚹(R ⨉ S)
• the theta join
SELECT DISTINCT X1, …, Xk
FROM R1, …, Rn
WHERE C
= x1, …, xk(C (R1 ⨉… ⨉ Rn)
12 The Relational Algebra
Derived Operators
In which meetings do the owners participate?
what,meetid(status=’a’(
owneruserid(Meetings) ⋈ piduserid(Participants)))
13 The Relational Algebra
Query Trees
owneruserid piduserid
Meetings Participants
what,meetid
status=’a’
14 The Relational Algebra
Limitations
The relational algebra cannot answer all queries
Flights
Which cities can be reached from Copenhagen
in one or more flights?
from to
Copenhagen Madrid
Rome London
Madrid Athens
Athens Rome
... ...
15 The Relational Algebra
Transitive Closure
The transitive closure of a binary relation R
R = { (x1,xk) | x1,...,xk-1 ((xi,xi+1) R) }
No relational algebra expression computes R
No SQL query can handle it either
• unless SQL is extended with recursion
• or a special closure operator is added
• (some DBMSs do support this)
x x = x idempotence
x y = y x commutativity
x ⋈ x = x idempotence
x ⋈ y = y ⋈ x commutativity
x (y z) = (x y) z associativity
x ⋈ (y ⋈ z) = (x ⋈ y) ⋈ z associativity
x ⋈ (y z) = (x ⋈ y) (x ⋈ z) distributivity
16 The Relational Algebra
Algebraic Laws (1/3)
17 The Relational Algebra
Algebraic Laws (2/3)
C(x y) = C(x) C(y) distributivity
C(x \ y) = C(x) \ C(y) = C(x) \ y distributivity
C(x ⋈ y) = C(x) ⋈ C(y) distributivity
C(x y) = C(x) C(y) distributivity
C(x) = C(C(x)) idempotence
C(D(x)) = D(C(x)) commutativity
CD(x) = C(D(x)) = C(x) ⋈ D(x) splitting
CD(x) = C(x) D(x) splitting
C(x) = x \ C(x) splitting
a(x y) = a(x) a(y) distributivity
(does not hold for ∖ and ∩)
ab(x y) = ab(x) ab(y) distributivity
ab(x \ y) = ab(x) \ ab(y) distributivity
bc(ab(x)) = ac(x) cancellation
ab(cd(x)) = cd(ab(x)) commutativity
18 The Relational Algebra
Algebraic Laws (3/3)
19 The Relational Algebra
Zero and Unit
Define 0 = the empty relation (for each schema)
Define 1 as follows
• the schema is empty
• the relation contains the single empty row
0 x = x 0 = x
0 ⋈ x = x ⋈ 0 = 0
1 ⋈ x = x ⋈ 1 = x
20 The Relational Algebra
Division
21 The Relational Algebra
Division Example
Completed dDB student task
Fred Database1
Fred Database2
Fred Compiler1
Eugene Database1
Eugene Compiler1
Eugene Compiler2
Sara Database1
Sara Database2
John Usability1
task
Database1
Database2
CompleteddDB student
Fred
Sara
Those students that have
completed all the dDB tasks
22 The Relational Algebra
Algebraic Query Optimization
Rewritings may improve efficiency
(A ⋈ B) ⋈ C A ⋈ (B ⋈ C)
C(A B) C(A) C(B)
Depends on the predicates (selectivities) and the
specific instances
23 The Relational Algebra
Algebraic Query Optimization
10 rows 106 rows 106 rows
1012 rows 10 rows
Rewritings may improve efficiency:
(A ⋈ B) ⋈ C A ⋈ (B ⋈ C)
C(A B) C(A) C(B)
Depends on the predicates (selectivities) and the
specific instances
24 The Relational Algebra
Rules of Thumb
Push selections down the expressions tree
Push projections down the expression tree
Order joins based on size estimates
In general, search for a good expression tree
• use heuristics
• use statistics: table sizes, distinct values for attributes,
histograms, etc.
25 The Relational Algebra
Bag Algebra
Allows relations to contain duplicate entries
Sets are replaced by bags
The bag versions of , , and \ count copies
The bag versions of , , and ⋈ keep duplicates
A better match with real-life SQL than sets
Does still not account for the ordering of the tuples
• SQL offers some support for ordering
• Tuples in a relation are stored on disk in some order
26 The Relational Algebra
Algebraic Laws for Bags
Fewer algebraic laws are valid for the bag algebra
Counter examples
x (y z) = (x y) (x z)
CD(x) = C(x) D(x)
Beware when optimizing bag queries!
27 The Relational Algebra
Algebraic Laws for Bags
Fewer algebraic laws are valid for the bag algebra
Counter examples
x (y z) = (x y) (x z)
CD(x) = C(x) D(x)
Beware when optimizing bag queries!
a
42 x,y,z =
C,D = true