Upload
alejandro-mcdowell
View
223
Download
2
Tags:
Embed Size (px)
Citation preview
Dr. A.I. Cristea
http://www.dcs.warwick.ac.uk/~acristea/
CS 319: Theory of Databases
2
… previous
Armstrong axioms
3
Content1. Generalities DB2. Integrity constraints (FD revisited)3. LLJ, DP and applications4. Relational Algebra (revisited)5. Query optimisation6. Temporal Data7. The Askew Wall8. Tuple calculus9. Domain calculus10. Query equivalence
4
Lossless Join Decomposition
• Lossless Join Definition:– Let { R
1 , R
2 } be a decomposition of R (meaning that
R1 R
2 = R); the decomposition is lossless if for every
legal instance r of R:
r = R1
(r) R2
(r)
• What is wrong with the following decomposition? – R = {A,B,C} and F = { A B, C B} and we replace R
by { R1 , R
2 } where R
1 = {A,B} and R
2 = {C,B}.
5
Sufficient Condition for Lossless Join
• Lossless Join means:– Let { R
1 , R
2 } be a decomposition of R (meaning that
R1 R
2 = R);
• Prove that for all legal instances r: r
R1(r)
R2(r)
• Prove that this decomposition is lossless if R
1 R
2 R
1 or R
1 R
2 R
2
– Can you give an example of a lossless join decomposition (instance) when neither
R1 R
2 R
1 nor R
1 R
2 R
2 hold?
6
Boyce-Codd Normal Form (BCNF)
• A relation scheme R is in BCNF if (and only if) for every non-trivial fd X Y F+, X is a superkey (for R).
• A database scheme D = {R1,..., R
n} is in BCNF if (and
only if) i {1,...,n}: Ri is in BCNF.
• Let R = {A,B,C} and F = { A B, C B} and let us decompose R into by { R
1 , R
2 } where R
1= {A,B} and
R2 = {C,B}. Is this decomposition in BCNF? Is this
the “best” decomposition in BCNF? (Can you find a better one?)
7
BCNF Decomposition Algorithmresult := {R};done := false;compute F+;while (not done) do
if (there is a schema Ri in result that is not in BCNF)then begin
let αβ be a nontrivial functional dependencythat holds on Ri such that αRi is not in F+,and αβ=;result := (result – Ri) (Ri – β) (α, β);
endelse done:= true;
8
Dependencies in a decomposition
• Which dependencies hold in R1 and R
2?
– R = {A,B,C} and F = { A B, B C} and we replace R by { R
1 , R
2 } where R
1 = {A,B} and R
2 = {B,C}.
– R = {A,B,C} and F = { A B, C B} and we replace R by { R
1 , R
2 } where R
1 = {A,B} and R
2 = {A,C}.
– R = {A,B,C} and F = { A B, B C} and we replace R by { R
1 , R
2 } where R
1 = {A,B} and R
2 = {A,C}
9
Third Normal Form (3NF)• Third Normal Form
– Informal Presentation– Example and Discussion– Formal Definition
• 3NF Decomposition Algorithm– Principle and Properties
• Lossless-join, dependency-preserving decomposition into 3NF
– Proof of Correctness– Example of 3NF Decomposition
• Third Normal Form and Boyce-Codd Normal Form
10
Informal Presentation
• Motivation– There are some situations where
• BCNF decomposition is not dependency preserving, and • Efficient checking for FD violation on updates is important
– Solution• Define a weaker normal form, called Third Normal Form
– FDs can be checked on individual relations without computing a join
– There is always a lossless-join, dependency-preserving decomposition into 3NF
11
Informal Presentation• Motivation
– Sometimes a relational schema and its FDs are not in BCNF but one does not want to decompose it further
– Example:• Relation Bookings with attributes:
– title, the name of the performance– theater, the name of the theater where the
performance is being shown– city, where the theater is located
• FDs are: theater city, title city theater• Is there a BCNF violation?
Bookings(title, theater, city)theater citytitle city theater
Yes: (theater city) because theater is not a superkeyNote: keys here are: (title, city) and (theater, title)
BCNF Decomposition:Bookings1(theater, city)Bookings2(title, theater)
12
Informal Presentation• Motivation
– Decomposition to get to BCNF may not always be desirable• BCNF decomposition is not dependency preserving, and • Efficient checking for FD violation on updates is important
– 3NF relaxes BCNF to allow relations that cannot be decomposed into BCNF relations without losing ability to check each FD
• Informal Definition of 3NF– A relation R is in third normal form if:
Whenever A B is a nontrivial FD:either A is a superkey or B is a member of some candidate key
Bookings(title, theater, city)theater citytitle city theater
As for BCNF
13
Informal Presentation• Informal Definition of 3NF
– A relation R is in third normal form if:
• The difference between BCNF and 3NF:– “B is a member of some candidate key”
– Previous example schema is in 3NF• Candidate keys here are: (title, city) and (theater, title)• Theater is not a superkey but city is a member of a candidate
key• What is the problem with this schema?
Whenever A B is a nontrivial FD:either A is a superkey or B is a member of some candidate key
Bookings(title, theater, city)theater citytitle city theater
14
Informal Presentation• Informal Definition of 3NF
– Previous example schema is in 3NF• What is the problem with this schema?
– The schema contains redundant information
Bookings(title, theater, city)theater citytitle city theater
LondonImperialBeethoven’s 5th Symphony
LondonNew TheaterCats
LondonImperialPhantom of the Opera
New YorkBroadwayCats
CityTheaterTitle
15
Formal Definition 3NF• Definition
– A relation schema R is in third normal form (3NF) if • for all functional dependencies in F+ of the form ,
where R and R, at least one of the following holds:
is a trivial functional dependency ( )
contains a key for R – every B is part of some candidate key of R
• BCNF and 3NF– A BCNF relation is in 3NF– A 3NF relation is not necessary in BCNF
BCNF Conditions
16
Formal Presentation• Example
– Consider the two relational schemas• R1 = (cust-num, name, house-num, street, city, state)
cust-num name, house-num, street, city, state
• R2 = (house-num, street, city, state, zip)house-num, street, city, state zipzip state
– Are these relations in 3NF?
17
Formal Presentation• Example in 3NF?
– For R1
• The only nontrivial functional dependencies in F+ are those with cust-num as a member of the left-side of the FD
• As cust-num is a superkey of R1, these functional dependencies satisfy the second condition for 3NF
R1= (cust-num, name, house-num, street, city, state)cust-num name, house-num, street, city, state
Three conditions for 3NF:• is a trivial functional dependency ( )• contains a key for R • Every B is part of some candidate key of R
18
Formal Presentation• Example in 3NF?
– For R2
• There are two kinds of nontrivial functional dependencies in F+:– Those with (house-num, street, city, state) as a subset of the left
hand side of the FD: As (house-num, street, city, state) is a superkey for R2, these functional dependencies satisfy the second condition for 3NF
– Those of the form {zip} {state} where For any such functional dependency:
( {state}) – ( {zip}) = {state} (or = )Because state is part of a candidate key of R2, such functional
dependencies satisfy the third condition for 3NF
R2 = (house-num, street, city, state, zip)house-num, street, city, state zipzip state
Three conditions for 3NF:• is a trivial functional dependency • contains a key for R • Every B is part of some candidate key of R
19
Decomposition into 3NF• Principles
– Input/Output• Input
– A set of functional dependencies F – A relation schema R
• Output– A lossless-join, dependency-preserving
decomposition in 3NF
– Canonical Cover• The set of dependencies Fc in the algorithm is a
canonical cover of the functional dependencies
20
Fc definition
a canonical cover Fc for F is a set of dependencies Fc for which:
• Fc <=> F • no fd in Fc is superfluous• no fd in Fc contains extraneous attrs• each left side of fd in Fc is unique
21
Extraneous attribute A in α→β in R
• Aα; F => F – {α→β} α-A)→β• Aβ; F – {α→β} α→(β -A)=> F
• Computed via attribute closures
22
Fc computation algorithm
Fc = F
Repeat apply union rule (right side of fd)
find fd with extraneous attrs (left/right side)
& delete these
Until Fc doesn’t change
23
Decomposition into 3NF
• Principles– The algorithm takes a set of dependencies and adds
one schema at a time, instead of decomposing the initial schema repeatedly
– The result is not uniquely defined since • A set of functional dependencies can have more than one
canonical cover• In some cases, the result of the algorithm depends on the order
which it considers the dependencies in Fc(minor bug in the algorithm, see later)
24
Decomposition into 3NF• Decomposition
– Given: relation R, set F of functional dependencies– Find: decomposition of R into a set of 3NF relation Ri
– Algorithm (sketch, real algorithm on next slides):
– Decomposition produces a lossless join and preserves dependencies– Prove !
(1) Eliminate redundant fd, resulting in a canonical cover Fc of F(2) Create a relation Ri = XY for each FD X Y in Fc(3) If the key K of R does not occur in any relation Ri, create one
more relation Ri=K
25
Let Fc be the canonical cover of F;j = 0;for each dependency α β in Fc
if none of schemes in Ri (i=1, 2, …, j) contains αβ then j = j+1; Rj = αβ;
end-ifif any of the schemes in Ri (i=1, 2, …, j-1) is contained in Rj
remove Riend-if
end-forif none of the schemes Ri (i=1, 2, …, j) contains a candidate key for R
thenj = j + 1;Rj = any candidate key for R;
end-ifreturn (R1, R2, …, Rj)
Decomposition Algorithm into 3NF
26
Decomposition into 3NF
• Example– Semester database of a university– Relational schema R=(L, I, T, R, S, G)– Attributes
• L: Lecture R: Room• I: Instructor S: Student• G: Grade T: Time
– Functional Dependencies• L I, TR L, TI R, LS G, TS R, TRI LR
R=(L, I, T, R, S, G)L I, TR L, TI R, LS G, TS R, TRI LR
27
Decomposition into 3NF• Example
– R=(L, I, T, R, S, G)– F: {L I, TR L, TI R, LS G, TS R, TRI LR}
– Are all FDs necessary? No !• TR L, TI R then TRI LR
– Canonical cover of F• Fc= {L I, TR L, TI R, TS R, LS G}
– Key: (ST)– Key attributes: S, T
(1) Eliminate redundant FD, resulting in a canonical cover Fc of F
28
Decomposition into 3NF• Example
– R = (L, I, T, R, S, G)– Fc = {L I, TR L, TI R, TS R, LS G}– Key attributes: S, T
– Decomposition in 3NF
• R1 = (L, I) R2 = (T, R, L)• R3 = (T, I, R) R4 = (L, S, G)• R5 = (S, T, R)
(3) If the key K of R does not occur in any relation Ri, create one more relation Ri=K, but it does.
(2) Create a relation Ri = XA for each FD X A in Fc
29
Decomposition into 3NF
• 3NF Decomposition Algorithm– Proof of Correctness
3NF decomposition algorithm is lossless join, dependency preserving decomposition into 3NF1. Dependency preserving
2. Lossless join
3. 3NF
30
Proof: Decomposition into 3NF is dependency preserving
• 3NF Decomposition Algorithm
– Decomposition is dependency preserving• 3NF decomposition algorithm is dependency
preserving since there is a relation for every FD in Fc.
31
Proof: Decomposition into 3NF is a lossless join
• 3NF Decomposition Algorithm
– Decomposition is lossless join• Lossless join decomposition
– A decomposition {R1, R2} is a lossless-join decompositionif R1 R2 R1 or R1 R2 R2
• Idea:
– A candidate key (K) is in one of the relations Ri in decomposition (last step of algorithm guarantees this)
– Closure of candidate key under Fc must contain all attributes in R (definition of candidate key)
– Follow the steps of attribute closure algorithm (Fig. 7.9)to show that the sufficient lossless join condition is satisfied for K+.
32
Proof: Decomposition into 3NF is actually 3NF!
• 3NF Decomposition Algorithm– Decomposition into 3NF
• Claim– If a relation Ri is in the decomposition generated by the
synthesis algorithm, then Ri is in 3NF
• Idea– To test for 3NF, it is sufficient to consider the functional
dependencies whose right-hand side is a single attribute– Therefore to see that Ri is in 3NF, we must show that
any functional dependency that holds in Ri, satisfies the definition of 3NF
33
Proof: Decomposition into 3NF is actually 3NF!• 3NF Decomposition Algorithm
– Decomposition into 3NF• Demonstration
– Assume is the dependency that generated Ri in the algorithm
– B must be in or , since B is in Ri and generated Ri
– Let us consider two possible cases» B is in but not » B is in but not
34
Proof: 3NF Decomposition is 3NF!
• 3NF Decomposition Algorithm– Decomposition into 3NF
• Demonstration– B is in but not in :
must be superkey (why?)» The second condition of 3NF is satisfied
– B is in but not in is a candidate key» The third alternative in the definition of 3NF is
satisfied» Note: we cannot show that is a superkey. This
shows exactly why the third alternative is present in the definition of 3NF
Three conditions for 3NF:• is a trivial functional dependency • contains a key for R • Every B is part of some candidate key of R
35
Decomposition into 3NF
• B is in Assume is not a superkey
must contain some attribute not in 1. Since B is in F+ it must be derivable from Fc, by using
attribute closure on 2. Attribute closure cannot have used
if it had been used, must be contained in the attribute closure of , which is not possible, since we assumed is not a superkey
3. Now, using (- {B}) and B, we can derive B (since , and since B is non-trivial)
4. Then, B is extraneous in the right-hand side of ; which is not possible since is in Fc (contradiction!)
5. Thus, if B is in then must be a superkey
: FD in R : FD that was used to generated Ri
36
Comparison of BCNF and 3NF• BCNF or 3NF?
– Relations in BCNF and 3NF• Relations in BCNF: no repetition of information• Relations in 3NF: problem of repetition of information
– Decomposition in BCNF and in 3NF• It is always possible to decompose a relation into
relations in 3NF and – the decomposition is lossless – dependencies are preserved
• It is always possible to decompose a relation into relations in BCNF and
– the decomposition is lossless– the information is not repeated
37
Compare BCNF and 3NF• To summarize
– Design Goals• Goal for a relational database design is:
– BCNF (no redundant information)– Lossless join– Dependency preservation
• If we cannot achieve this, we accept:– 3NF (possible repetition of information)– Lossless join– Dependency preservation
38
Summary
• We have learned:– LLJ– DP– BCNF + algorithm– 3rd NF + algorithm
39
… to follow
Relational Algebra, revisited
40
Questions?