Upload
garapatiavinash
View
190
Download
5
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
1
1
Normalization
Normalization
• Normalization is the process of efficiently
organizing data in a database with two
goals in mind
• First goal: eliminate redundant data
– for example, storing the same data in more
than one table
• Second Goal: ensure data dependencies
make sense
– for example, only storing related data in a
table
Benefits of Normalization
• Less storage space
• Quicker updates
• Less data
inconsistency
• Clearer data
relationships
• Easier to add data
• Flexible Structure
Bad database designs
results in: redundancy:
inefficient storage.
anomalies: data
inconsistency, difficulties in
maintenance
4
Example
Name Price Category Manufacturer
gizmo $19.99 gadgets GizmoWorks
Power gizmo $29.99 gadgets GizmoWorks
SingleTouch $149.99 photography Canon
MultiTouch $203.99 household Hitachi
Relational schema:Product(Name, Price, Category, Manufacturer)
Instance:
5
First Normal Form (1NF)• A database schema is in First Normal
Form if all tables are flat
Name GPA Courses
Alice 3.8
Bob 3.7
Carol 3.9
Math
DB
OS
DB
OS
Math
OS
Student Name GPA
Alice 3.8
Bob 3.7
Carol 3.9
Student
Course
Math
DB
OS
Student Course
Alice Math
Carol Math
Alice DB
Bob DB
Alice OS
Carol OS
Takes Course
May need
to add keys
6
Functional Dependencies
• A form of constraint
– hence, part of the schema
• Finding them is part of the database
design
• Also used in normalizing the relations
• Warning: this is the most abstract, and
“hardest” part of the database design.
2
7
Functional Dependencies
Definition:
If two tuples agree on the attributes
then they must also agree on the attributes
Formally:
A1, A2, …, An B1, B2, …, Bm
A1, A2, …, An
B1, B2, …, Bm
Functional dependency between A and B
8
Examples
• EmpID Name, Phone, Position
• Position Phone
• but Phone Position
EmpID Name Phone PositionE0045 Smith 1234 ClerkE1847 John 9876 SalesrepE1111 Smith 9876 SalesrepE9999 Mary 1234 Lawyer
9
In General
• To check A B, erase all other columns
• check if the remaining relation is many-one
(called functional in mathematics)
… A … B
X1 Y1
X2 Y2
… …
10
Example
EmpID Name Phone Position
E0045 Smith 1234 Clerk
E1847 John 9876 Salesrep
E1111 Smith 9876 Salesrep
E9999 Mary 1234 Lawyer
Position Phone
11
Typical Examples of FDs
Product: name price, manufacturer
Person: ssn name, age
Company: name stockprice, president
12
Example
Product(name, category, color, department, price)
name color
category department
color, category price
Consider these FDs:
What do they say ?
3
13
Example
FD’s are constraints on relations:
• On some instances they hold
• On others they don’t
name category color department price
Gizmo Gadget Green Toys 49
Tweaker Gadget Green Toys 99
Does this instance satisfy all the FDs ?
name color
category department
color, category price
14
Example
name category color department price
Gizmo Gadget Green Toys 49
Tweaker Gadget Black Toys 99
Gizmo Stationary Green Office-supp. 59
What about this one ?
name color
category department
color, category price
15
Example
If some FDs are satisfied, then
others are satisfied too
If all these FDs are true:name color
category department
color, category price
Then this FD also holds: name, category price
Why ??16
Inference Rules for FD’s
Is equivalent to
Splitting rule
and
Combining rule
A1 ... Am B1 ... Bm
A1, A2, …, An B1, B2, …, Bm
A1, A2, …, An B1
A1, A2, …, An B2
. . . . .
A1, A2, …, An Bm
17
Inference Rules for FD’s
(continued)
Trivial Rule
Why ?
A1 … Am
where i = 1, 2, ..., n
A1, A2, …, An Ai
18
Inference Rules for FD’s
(continued)
Transitive Closure Rule
If
and
then
Why ?
A1, A2, …, An B1, B2, …, Bm
B1, B2, …, Bm C1, C2, …, Cp
A1, A2, …, An C1, C2, …, Cp
4
19
A1 … Am B1 … Bm C1 ... Cp
Functional Dependencies
We use functional dependencies to:
test relations to see if they are legal under a
given set of functional dependencies.
If a relation r is legal under a set F of functional
dependencies, we say that r satisfies F.
specify constraints on the set of legal relations
We say that F holds on R if all legal relations on R
satisfy the set of functional dependencies F.
20
21
• K is a superkey for relation schema R if and only if K R
• K is a candidate key for R if and only if
– K R, and
– for no K, R
• Functional dependencies allow us to express constraints that
cannot be expressed using superkeys. Consider the schema:
bor_loan = (customer_id, loan_number, amount )
We expect this functional dependency to hold:
loan_number amount
but would not expect the following to hold:
amount customer_name
Functional Dependencies
22
• A functional dependency is trivial if
– Example:
• customer_name, loan_number customer_name
• customer_name customer_name
Functional Dependencies
23
• Consider the relation:
PLOTS (prop#, state, plot#, area, price, Tax_rate)
Information about plots available in India. The constraints on
the relation are:
– Prop# is unique throughout India
– Plot# are unique within a given state
– For a given_state, tax_rate is fixed
– Plots having the same area have the same price,
irrespective of the state in which they are located
• Write all the FDs on the relation PLOTS
Functional Dependencies
24
Functional Dependencies
PLOTS
Prop# State Plot# Area Price Tax_rate
FD1 PK
FD2 CK
FD3
FD4
Identify redundancy in PLOTS
Identify update anomalies in PLOTS
5
25
Functional Dependencies
PLOTS
FD1 PK
FD2 CK
Plot#StateProp# Area
PriceArea
FD4
Tax_rate
FD3
State
26
Dependency Diagram (1NF)
Figure 4.4
27
Conversion to 1NF• A relational schema R is in first normal form if the
domains of all attributes of R are atomic
• Repeating groups must be eliminated
– Proper primary key developed
• Uniquely identifies attribute values (rows)
• Combination of PROJ_NUM and EMP_NUM
– Dependencies can be identified
• Desirable dependencies based on primary key
• Less desirable dependencies
– Partial
» based on part of composite primary key
– Transitive
» one nonprime attribute depends on another nonprime
attribute28
1NF Summarized
• Each attribute must be atomic (single value)
• No repeating columns within a row (composite attributes)
• No multi-valued columns.
• All key attributes defined
• All attributes dependent on primary key
• 1NF simplifies attributes
• Queries become easier.
29
Conversion to 2NF
• Start with 1NF format:
• Write each key component on separate line
• Write original key on last line
• Each component is new table
• Write dependent attributes after each key
PROJECT (PROJ_NUM, PROJ_NAME)
EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)
ASSIGN (PROJ_NUM, EMP_NUM, HOURS)
30
Second Normal Form (2NF)Each attribute must be functionally dependent on the primary key.
• If the primary key is a single attribute, then the relation is in 2NF
• The test for 2NF involves testing for FDs whose left-hand-side attribute are part of the primary key
• Disallow partial dependency, where non-keys attributes depend on part of a composite primary key
• In short, remove partial dependencies
2NF improves data integrity.
• Prevents update, insert, and delete anomalies.
6
31
2NF Conversion ResultsFigure 4.5
32
• Based on the concept of Full FDs (FFD)
• If A & B are sets of attributes of R, B is said to be FFD on A if AB, but no proper subset of A determines B
• No partial dependencies on the PK
• Is PLOTS in 2NF?
• YES
• Single attribute PK
• All relations with single attribute PK are in 2 NF!!
• 2 NF applies to relations with composite keys
2 NF
33
• A relation that is in 1NF & every non-PK
attribute is fully functionally dependent on
the PK, is said to be in 2 NF
1 NF
2 NF
2 NF
Remove all
Partial Dependencies
34
2NF Summarized
• In 1NF
• Includes no partial dependencies
– No attribute dependent on a portion of primary
key
• Still possible to exhibit transitive dependency
– Attributes may be functionally dependent on
nonkey attributes
35
Conversion to 3NF
• Create separate table(s) to eliminate
transitive functional dependencies
PROJECT (PROJ_NUM, PROJ_NAME)
ASSIGN (PROJ_NUM, EMP_NUM, HOURS)
EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS)
JOB (JOB_CLASS, CHG_HOUR)
36
• Based on the concept of transitive dependency
• No non-PK attribute should be transitively dependent on the PK
• Transitive Dependency
If AB & BC, then A transitively determines C through B, provided B & C do not determine A
• Is PLOTS in 3NF?
• NO
3 NF
7
37
3 NF
PLOTS
Prop# State Plot# Area Price Tax_rate
FD1 PK
FD2 CK
FD3
FD4
Prop# transitively determines tax_rate through state
Prop# transitively determines price through area38
• A relation that is in 1NF & 2 NF & no non-PK
attribute is transitively dependent on the PK,
is said to be in 3 NF
2 NF
3 NF
3 NF
Remove all
Transitive Dependencies
39
2NF – Example - 1
• Inventory (Item, Supplier, Cost, Supplier Address)
• We first check if Cost is fully functionally dependent upon
the ENTIRE Primary-Key
• If I know just Item, can I find out Cost?
– No. We can have > 1 supplier for the same product.
• If I know just Supplier, and I find out Cost?
– No. We need to know what the Item is as well.
• So, Cost is fully functionally dependent upon the
ENTIRE Primary-Key
40
2NF – Example - 2
• Inventory (Item, Supplier, Cost, Supplier Address)
• We then check if Supplier Address is fully functionally
dependent upon the ENTIRE Primary-Key
• If I know just Item, can I find out Supplier Address?
– No. We can have > 1 supplier for the same product.
• If I know just Supplier, and I find out Supplier Address?
– Yes. The supplier’s address does not depend on the
Item.
• So, Supplier Address is NOT fully functionally
dependent upon the ENTIRE Primary-Key NOT 2NF
So putting things together
Inventory
Description Supplier Cost Supplier Address
Inventory
Description Supplier Cost
Supplier
Name Supplier Address
The above relation is now in 2NF since the relation has no non-key
attributes.
Transitive Dependence
Give a relation R,
Assume the following FD hold:
Note : Both Ename and Address attributes are non-key attributes in R, and
since
Address depends on a non-Prime attribute Name, which depends on the
primary
key(EmpNo), a transitive dependency exists
EmpNo EName Salary Address
AddressEmpNoAddresstEnameEnameEmpNo ,,
AddressEname
EmpNo EName Salary Ename Address
R1 R2
8
43
• Boyce-Codd Normal Form (BCNF)
– A relation is in Boyce-Codd normal form
(BCNF) if every determinant in the table is a
candidate key.
(A determinant is any attribute whose value
determines other values with a row.)
– If a table contains only one candidate key, the
3NF and the BCNF are equivalent.
– BCNF is a special case of 3NF.
Database Normalization
A Table That Is In 3NF But Not In BCNF
Figure 5.7
The Decomposition of a Table Structure to Meet
BCNF Requirements
Figure 5.8 46
Sample Data for a BCNF Conversion
47
Decomposition into BCNF
48
• Based on FDs that take into account all candidate
keys of a relation
• For a relation with only 1 CK, 3NF & BCNF are
equivalent
• A relation is said to be in BCNF if every
determinant is a CK
• Is PLOTS in BCNF?
• NO
BCNF
9
49
• Consider the relation R(A,B,C) with functional dependencies AB C and
C B.
• Is R in 2NF?
• Is R in 3NF?
• Is R in BCNF?
Problem 1
50
Closure of a set of FDs
• Given a set of FDs F on a relation R, it may be possible that several other FDs must also hold for R
• For Example, R=(A,B,C) & FDs, AB & BC hold in R, then FD AC also holds on R
• For a given value of A, there can be only one corresponding value of B, & for that value of B, there can be only one corresponding value for C
• The closure of F is the set of all FDs that can be inferred from F, & is denoted by F+
51
Closure of a set of FDs
• It is not suff. to consider just the given set of FDs
• We need to consider all FDs that hold
• Given F, more FDs can be inferred
• Such FDs are said to be logically implied by F
• F+ is the set of all FDs logically implied by F
• We can compute F+using formal defn. of FD
• If F were large, this process would be lengthy & cumbersome
• Axioms or Rules of Inference provide simpler technique
• Armstrong;s Axioms
52
Inference Rules for FDs
Armstrong's inference rules:IR1. (Reflexive) If Y X, then X Y
IR2. (Augmentation) If X Y, then XZ YZ
(Notation: XZ stands for X U Z)
IR3. (Transitive) If X Y and Y Z, then X Z
IR1, IR2, IR3 form a sound & complete set of
inference rules
Never generates any wrong FD
Generate all FDs that hold
53
Some additional inference rules that are
useful:
Decomposition: If XYZ, then XY & XZ
Union: If XY & XZ, then XYZ
Psuedotransitivity: If XY & WYZ,then WXZ
• The last three inference rules, as well as any other
inference rules, can be deduced from IR1, IR2, and IR3
(completeness property)
Inference Rules for FDs
54
Example
• R = (A, B, C, G, H, I)
F = { A B
A C
CG H
CG I
B H}
• some members of F+
– A H
• by transitivity from A B and B H
– AG I
• by augmenting A C with G, to get AG CG
and then transitivity with CG I
– CG HI
• By union rule
10
55
Closure of Attribute Sets
• Given a set of attributes define the closure of under F
(denoted by +) as the set of attributes that are functionally
determined by under F
• Algorithm to compute +, the closure of under F
result := ;
while (changes to result) dofor each in F do
beginif result then result := result
end
56
Example of Attribute Set Closure
• R = (A, B, C, G, H, I)
• F = {A B, A C, CG H, CG I, B H}
• (AG)+
1. result = AG
2. result = ABCG (A C and A B)
3. result = ABCGH (CG H and CG AGBC)
4. result = ABCGHI (CG I and CG AGBCH)
• Is AG a candidate key? 1. Is AG a super key?
1. Does AG R? == Is (AG)+ R
2. Is any subset of AG a superkey?
1. Does A R? == Is (A)+ R
2. Does G R? == Is (G)+ R
57
Uses of Attribute ClosureThere are several uses of the attribute closure algorithm:
• Testing for superkey:
– To test if is a superkey, we compute +, and check if +
contains all attributes of R.
• Testing functional dependencies
– To check if a functional dependency holds (or, in other
words, is in F+), just check if +.
– That is, we compute + by using attribute closure, and then
check if it contains .
– Is a simple and cheap test, and very useful
• Computing closure of F
– For each R, we find the closure +, and for each S +,
we output a functional dependency S.
58
Canonical Cover• Sets of functional dependencies may have redundant
dependencies that can be inferred from the others– For example: A C is redundant in: {A B, B
C}
– Parts of a functional dependency may be redundant• E.g.: on RHS: {A B, B C, A CD} can be
simplified to {A B, B C, A D}
• E.g.: on LHS: {A B, B C, AC D} can be simplified to
{A B, B C, A D}
• Intuitively, a canonical cover of F is a “minimal” set of functional dependencies equivalent to F, having no redundant dependencies or redundant parts of dependencies
59
Equivalence of Sets of FDs
• Two sets of FDs F and G are equivalent if:
- every FD in F can be inferred from G, &
- every FD in G can be inferred from F
• Hence, F and G are equivalent if F+=G+
Definition: F covers G if every FD in G can be inferred from F (i.e., if G+ F+)
• F and G are equivalent if F covers G and G covers F
• There is an algorithm for checking equivalence of sets of FDs
60
Extraneous Attributes
• Consider a set F of functional dependencies and the functional dependency in F.
– Attribute A is extraneous in if A and F logically implies (F – { }) {( – A) }.
– Attribute A is extraneous in if Aand the set of functional dependencies (F – { }) { ( – A)} logically implies F.
• Note: implication in the opposite direction is trivial in each of the cases above, since a “stronger” functional dependency always implies a weaker one
• Example: Given F = {A C, AB C }
– B is extraneous in AB C because {A C, AB C} logically implies A C (I.e. the result of dropping B from AB C).
• Example: Given F = {A C, AB CD}
– C is extraneous in AB CD since AB C can be inferred even after deleting C
11
61
Testing if an Attribute is Extraneous
• Consider a set F of functional dependencies and the functional dependency in F.
• To test if attribute A is extraneous in
1. compute ({ } – A)+ using the dependencies in F
2. check that ({ } – A)+ contains ; if it does, A is
extraneous
• To test if attribute A is extraneous in
1. compute + using only the dependencies in
F’ = (F – { }) { ( – A)},
2. check that + contains A; if it does, A is extraneous
62
Canonical Cover
• A canonical cover for F is a set of dependencies Fc such that
– F logically implies all dependencies in Fc, and
– Fc logically implies all dependencies in F, and
– No functional dependency in Fc contains an extraneous attribute, and
– Each left side of functional dependency in Fc is unique.
• To compute a canonical cover for F:repeat
Use the union rule to replace any dependencies in F1 1 and 1 2 with 1 1 2
Find a functional dependency with an extraneous attribute either in or in
If an extraneous attribute is found, delete it from until F does not change
• Note: Union rule may become applicable after some extraneous attributes have been deleted, so it has to be re-applied
63
Computing Canonical Cover
• R = (A, B, C)F = {A BC, B C, A B, AB C}
• Combine A BC and A B into A BC
– Set is now {A BC, B C, AB C}
• A is extraneous in AB C
– Check if the result of deleting A from AB C is implied by the other dependencies• Yes: in fact, B C is already present!
– Set is now {A BC, B C}
• C is extraneous in A BC
– Check if A C is logically implied by A B and the other dependencies• Yes: using transitivity on A B and B C.
– Can use attribute closure of A in more complex cases
• The canonical cover is: A B, B C 64
Decomposition1. Decomposing the schema
R = ( bname, bcity, assets, cname, lno, amt)
R1 = (bname, bcity, assets, cname) R1 = (cname, lno, amt)
2. Decomposing the instance
R = R1 U R2
bname bcity assets cname lno amt
Downtown Bkln 9M Jones L-17 1000
Downtown Bkln 9M Johnson L-23 2000
Mianus Horse 1.7M Jones L-93 500
Downtown Bkln 9M Hayes L-17 1000
bname bcity assets cname
Downtown Bkln 9M Jones
Downtown Bkln 9M Johnson
Mianus Horse 1.7M Jones
Downtown Bkln 9M Hayes
cname lno amt
Jones L-17 1000
Johnson L-23 2000
Jones L-93 500
Hayes L-17 1000
65
Goals of Decomposition1. Lossless Joins
Want to be able to reconstruct big (e.g. universal) relation by
joining smaller ones (using natural joins)
(i.e. R1 R2 = R)
2. Dependency preservation
Want to minimize the cost of global integrity constraints based on FD’s
( i.e. avoid big joins in assertions)
3. Redundancy Avoidance
Avoid unnecessary data duplication (the motivation for decomposition)
Why important?
LJ : information loss
DP: efficiency (time)
RA: efficiency (space), update anomalies
Lossy Decomposition
A B C
1 2 3
4 5 6
7 2 8
1 2 8
7 2 3
A B C
1 2 3
4 5 6
7 2 8
A B
1 2
4 5
7 2
B C
2 3
5 6
2 8
JOINSpurious Tuples
12
67
Dependency Goal #1: lossless joinsA bad decomposition:
bname bcity assets cname
Downtown Bkln 9M Jones
Downtown Bkln 9M Johnson
Mianus Horse 1.7M Jones
Downtown Bkln 9M Hayes
cname lno amt
Jones L-17 1000
Johnson L-23 2000
Jones L-93 500
Hayes L-17 1000
=
bname bcity assets cname lno amt
Downtown Bkln 9M Jones L-17 1000
Downtown Bkln 9M Jones L-93 500
Downtown Bkln 9M Johnson L-23 2000
Mianus Horse 1.7M Jones L-17 1000
Mianus Horse 1.7M Jones L-93 500
Downtown Bkln 9M Hayes L-17 1000
Problem: join adds meaningless tuples
“lossy join”: by adding noise, have lost meaningful information as a
result of the decomposition
68
Dependency Goal #1: lossless joinsIs the following decomposition lossless or lossy?
bname assets cname lno
Downtown 9M Jones L-17
Downtown 9M Johnson L-23
Mianus 1.7M Jones L-93
Downtown 9M Hayes L-17
lno bcity amt
L-17 Bkln 1000
L-23 Bkln 2000
L-93 Horse 500
Ans: Lossless: R = R1 R2, it has 4 tuples
69
Ensuring Lossless Joins
A decomposition of R : R = R1 U R2
Is lossless iff
R1 R2 R1, or
R1 R2 R2
(i.e., intersecting attributes must for a superkey for
one of the resulting smaller relations)
Lossless Decomposition
Theorem
A decomposition of R into R1 and R2 is lossless join wrt FDs F, if and only if at least one of the following dependencies is in F+:
• R1 R2 R1• R1 R2 R2
In other words, R1 R2 forms a superkey of
either R1 or R2
Lossy Decomposition
S# Status
S3 30
S5 30
S# City
S3 Paris
S5 Athens
S# Status
S3 30
S5 30
Status City
30 Paris
30 Athens
S# Status City
S3 30 Paris
S5 30 Athens
Lossless Decomposition
• Observe that S satisfies the FDs:
– S# Status & S# City
• It can not be a coincidence that S is equal to the
join of its projections on {S#, Status} & {S#, City}
• Heaths’ Theorem:
Let R{A,B,C} be a relation, where A, B, & C are
sets of attributes. If R satisfies AB & AC,
then R is equal to the join of its projections on
{A,B} & {A,C}
• Observe that in the second decomposition of S
the FD, S# City is lost
13
Lossless Decomposition
• The decomposition of R into R1, R2, …Rn is lossless if for
any instance r of R
r = R1 (r ) R2 (r ) …… Rn (r )
• We can replace R by R1 & R2, knowing that the instance of
R can be recovered from the instances of R1 & R2
• We can use FDs to show that decompositions are lossless
74
Decomposition Goal #2: Dependency
preservationGoal: efficient integrity checks of FD’s
An example w/ no DP:
R = ( bname, bcity, assets, cname, lno, amt)
bname bcity assets
lno amt bname
Decomposition: R = R1 U R2
R1 = (bname, assets, cname, lno)
R2 = (lno, bcity, amt)
Lossless but not DP. Why?
Ans: bname bcity assets crosses 2 tables
75
Decomposition Goal #2: Dependency
preservationTo ensure best possible efficiency of FD checks
ensure that only a SINGLE table is needed in order to check each FD
i.e. ensure that: A1 A2 ... An B1 B2 ... Bm
Can be checked by examining Ri = ( ..., A1, A2, ..., An, ..., B1, ..., Bm, ...)
To test if the decomposition R = R1 U R2 U ... U Rn is DP
(1) see which FD’s of R are covered by R1, R2, ..., Rn
(2) compare the closure of (1) with the closure of FD’s of R
76
Decomposition Goal #2: Dependency
preservation
Example: Given F = { AB, AB D, C D}
consider R = R1 U R2 s.t.
R1 = (A, B, D) , R2 = (C, D)
(1) F+ = { ABD, CD}+
(2) G = {ABD, CD, ...} +
(3) F+ = G+
note: G+ cannot introduce new FDs not in F+
Decomposition is DP
77
Dependency Preservation
• Let Fi be the set of dependencies F + that include only attributes in Ri.
• A decomposition is dependency preserving, if
(F1 F2 … Fn )+ = F +
• If it is not, then checking updates for violation of functional dependencies may require computing joins, which is expensive.
78
Testing for Dependency Preservation
• To check if a dependency is preserved in a
decomposition of R into R1, R2, …, Rn we apply the following
test (with attribute closure done with respect to F)
– result =
while (changes to result) do
for each Ri in the decompositiont = (result Ri)
+ Ri
result = result t
– If result contains all attributes in , then the functional dependency
is preserved.
• We apply the test on all dependencies in F to check if a
decomposition is dependency preserving
• This procedure takes polynomial time, instead of the
exponential time required to compute F+ and (F1 F2 …
Fn)+
14
Example• R = (A, B, C)
F = {A B, B C)
– Can be decomposed in two different ways
• R1 = (A, B), R2 = (B, C)
– Lossless-join decomposition:
R1 R2 = {B} and B BC
– Dependency preserving
• R1 = (A, B), R2 = (A, C)
– Lossless-join decomposition:
R1 R2 = {A} and A AB
– Not dependency preserving (cannot check B C without computing R1 R2)
80
Decomposition Goal #3: Redudancy
Avoidance
Redundancy
for B=x , y and z
Example: A B C
a x 1
e x 1
g y 2
h y 2
m y 2
n z 1
p z 1
(1) An FD that exists in the above relation is: B C
(2) A superkey in the above relation is A, (or any set containing A)
When do you have redundancy?
Ans: when there is some FD, XY covered by a relation
and X is not a superkey
Problems with Decompositions
There are three potential problems to consider:
– Some queries become more expensive
• e.g., What is the price of prop# 1?
– Given instances of the decomposed relations, we
may not be able to reconstruct the corresponding
instance of the original relation!
• Fortunately, not in the PLOTS example
– Checking some dependencies may require joining the
instances of the decomposed relations.
• Fortunately, not in the PLOTS example
Tradeoff: Must consider these issues vs. redundancy
Example• R = (A, B, C )
F = {A B
B C}
Key = {A}
• R is not in BCNF (B C but B is not
superkey)
• Decomposition R1 = (A, B), R2 = (B, C)
– R1 and R2 in BCNF
– Lossless-join decomposition
– Dependency preserving
Testing for BCNF• To check if a non-trivial dependency causes a violation of BCNF
1. compute + (the attribute closure of ), and
2. verify that it includes all attributes of R, that is, it is a superkey of R.
• Simplified test: To check if a relation schema R is in BCNF, it suffices to check only the dependencies in the given set F for violation of BCNF, rather than checking all dependencies in F+.
– If none of the dependencies in F causes a violation of BCNF, then none of the dependencies in F+ will cause a violation of BCNF either.
• However, simplified test using only F is incorrect when testing a relation in a decomposition of R
– Consider R = (A, B, C, D, E), with F = { A B, BC D}
• Decompose R into R1 = (A,B) and R2 = (A,C,D, E)
• Neither of the dependencies in F contain only attributes from(A,C,D,E) so we might be mislead into thinking R2 satisfies BCNF.
• In fact, dependency AC D in F+ shows R2 is not in BCNF.
BCNF and Dependency Preservation
• R = (J, K, L )F = {JK L
L K }
Two candidate keys = JK and JL
• R is not in BCNF
• Any decomposition of R will fail to preserve
JK L
This implies that testing for JK L requires a
join
It is not always possible to get a BCNF decomposition that is
dependency preserving
15
Third Normal Form: Motivation
• There are some situations where
– BCNF is not dependency preserving, and
– efficient checking for FD violation on updates is
important
• Solution: define a weaker normal form, called Third
Normal Form (3NF)
– Allows some redundancy (with resultant problems; we
will see examples later)
– But functional dependencies can be checked on
individual relations without computing a join.
– There is always a lossless-join, dependency-
preserving decomposition into 3NF.
Redundancy in 3NF
J
j1
j2
j3
null
L
l1
l1
l1
l2
K
k1
k1
k1
k2
repetition of information (e.g., the relationship l1, k1)
(i_ID, dept_name)
need to use null values (e.g., to represent the relationship
l2, k2 where there is no corresponding value for J).
(i_ID, dept_nameI) if there is no separate relation mapping
instructors to departments
• There is some redundancy in this schema
• Example of problems due to redundancy in 3NF
– R = (J, K, L)F = {JK L, L K }
Testing for 3NF
• Optimization: Need to check only FDs in F, need not check all FDs
in F+.
• Use attribute closure to check for each dependency , if is a
superkey.
• If is not a superkey, we have to verify if each attribute in is
contained in a candidate key of R
– this test is rather more expensive, since it involve finding
candidate keys
– testing for 3NF has been shown to be NP-hard
– Interestingly, decomposition into third normal form (described
shortly) can be done in polynomial time
3NF Decomposition AlgorithmLet Fc be a canonical cover for F;i := 0;for each functional dependency in Fc doif none of the schemas Rj, 1 j i contains
then begini := i + 1;Ri :=
endif none of the schemas Rj, 1 j i contains a candidate key for Rthen begin
i := i + 1;Ri := any candidate key for R;
end /* Optionally, remove redundant relations */
repeatif any schema Rj is contained in another schema Rk
then /* delete Rj */Rj = R;;i=i-1;
return (R1, R2, ..., Ri)
Testing Decomposition for BCNF
• To check if a relation Ri in a decomposition of R is in BCNF,
– Either test Ri for BCNF with respect to the restriction of F
to Ri (that is, all FDs in F+ that contain only attributes from
Ri)
– or use the original set of dependencies F that hold on R, but
with the following test:
– for every set of attributes Ri, check that + (the
attribute closure of ) either includes no attribute of
Ri- , or includes all attributes of Ri.
• If the condition is violated by some in F, the
dependency( + - ) Ri
can be shown to hold on Ri, and Ri violates BCNF.
• We use above dependency to decompose Ri
BCNF Decomposition Algorithmresult := {R };
done := false;
compute F +;
while (not done) do
if (there is a schema Ri in result that is not in BCNF)
then begin
let be a nontrivial functional dependency that holds on Ri such that Ri is not in F +,
and = ;
result := (result – Ri ) (Ri – ) ( , );
end
else done := true;
Note: each Ri is in BCNF, and decomposition is lossless-join.
16
Example of BCNF Decomposition
• class (course_id, title, dept_name, credits, sec_id, semester, year, building, room_number, capacity, time_slot_id)
• Functional dependencies:
– course_id→ title, dept_name, credits
– building, room_number→capacity
– course_id, sec_id, semester, year→building, room_number, time_slot_id
• A candidate key {course_id, sec_id, semester, year}.
• BCNF Decomposition:
– course_id→ title, dept_name, credits holds
• but course_id is not a superkey.
– We replace class by:
• course(course_id, title, dept_name, credits)
• class-1 (course_id, sec_id, semester, year, building, room_number, capacity, time_slot_id)
BCNF Decomposition (Cont.)
• course is in BCNF
– How do we know this?
• building, room_number→capacity holds on class-1
– but {building, room_number} is not a superkey for class-1.
– We replace class-1 by:
• classroom (building, room_number, capacity)
• section (course_id, sec_id, semester, year, building,
room_number, time_slot_id)
• classroom and section are in BCNF.
93
4 NF
• BCNF removes any anomalies due to FDs
• Further research has led to the identification of
another type of dependency called Multi-valued
Dependency (MVD)
• Proposed by R Fagin* in 1977
• MVDs can also cause data redundancy
• MVDs are a generalization of FDs
* R Fagin: “Multi-valued Dependencies & a new normal form for
relational databases,” ACM TODS2, No. 3 (Sept. 1977) 94
4 NF
• Consider the following relation:
• In relational databases, repeating groups are not
allowed
Course Teacher Texts
DBS N Goyal
J P Misra
Yash
Garcia
Korth
Elmasiri
Raghu
Networks S Mohan
Rahul
J P Misra
Tannenbaum
Keshav
Petterson
95
4 NF• 1 NF Version
COURSE TEACHER TEXTS
DBS N GOYAL GARCIA
DBS N GOYAL KORTH
DBS N GOYAL ELMASIRI
DBS N GOYAL RAGHU R
DBS J P MISRA GARCIA
DBS J P MISRA KORTH
DBS J P MISRA ELMASIRI
DBS J P MISRA RAGHU R
NETWORKS S MOHAN TANNENBAUM
NETWORKS S MOHAN KESHAV
NETWORKS S MOHAN KUROSE
NETWORKS RAHUL TANNENBAUM
NETWORKS RAHUL KESHAV
NETWORKS RAHUL KUROSE
CTX
96
4 NF• ANY REDUNDANCY? ANY ANOMALIES?
COURSE TEACHER TEXTS
DBS N GOYAL GARCIA
DBS N GOYAL KORTH
DBS N GOYAL ELMASIRI
DBS N GOYAL RAGHU R
DBS J P MISRA GARCIA
DBS J P MISRA KORTH
DBS J P MISRA ELMASIRI
DBS J P MISRA RAGHU R
NETWORKS S MOHAN TANNENBAUM
NETWORKS S MOHAN KESHAV
NETWORKS S MOHAN PETTERSON
NETWORKS RAHUL TANNENBAUM
NETWORKS RAHUL KESHAV
NETWORKS RAHUL PETTERSON
CTX
17
97
4 NF• Redundancy is due to the constraint that the texts
for a course are independent of the instructors
• This constraint cannot be expressed in terms of
FDs
• Example of MVD
• Is CTX in BCNF?
• New Teacher for DBS
• New Text for Networks
• Teacher teaching DBS leaves
98
4 NF
• Decompose CTX into CT & TX
COURSE TEACHER
DBS N GOYAL
DBS J P MISRA
DBS S JAGADISH
NETWORKS S MOHAN
NETWORKS RAHUL
NETWORKS J P MISRA
COURSE TEXT
DBS GARCIA
DBS KORTH
DBS ELMASIRI
DBS RAGHU R
NETWORKS TANNENBAUM
NETWORKS KESHAV
NETWORKS PETTERSON
CTTX
99
4 NF
• Decompose CTX into CT & TX is not done on the
basis of FDs
• Decompose CTX into CT & TX is done on the basis
of MVDs
• MVDs
Represents a dependency between attributes of a relation,
such that for every value of A, there is a set of values of B &
a set of values of C, The set of values for B & C are
independent of each other
course teacher
course text
100
Multi-Valued Dependencies
• A multi-valued dependency occurs when a
determinant determines more than one
dependent, and the dependents are
independent of each other
• Example course implies teacher; course implies
text, where teacher and text are independent
• A relation with course, instructor and text is all
key, and exhibits redundancy, but is in 3NF
• Updates can exhibit anomalies
101
4 NF
• An MVDs A B is trivial if
(a) B A or
(b) A U B = R
• A relation that is in BCNF & contains no non-trivial
MVDs is said to be in 4NF
• CTX is not in 4NF because course teacher is a
non trivial MVD
102
Fourth Normal Form
• Relation R is in 4 NF if and only if, whenever there exist subsets A and B of the attributes of R such that the nontrivial multi-valued dependency A multi-determines B is satisfied, then all attributes of R are also functionally dependent on A
• In the previous example, decompose course,instructor, text into two relation: course, instructor, and course text
18
103
Multi-Valued Dependencies
• An MVD is an assertion that 2 attributes or sets of attributes are independent of each other
• Generalization of the concept of FD in the sense that every FD implies a corresponding MVD
• Independence of attribute sets cannot be explained using FDs
• SO what causes MVDs?
• Role of MVDs in database schema design
104
Multi-Valued Dependencies
• Most common source of redundancy in BCNF schemas is to put 2 or more M:M relationships in a single relation
• Note that in CTX, there are no non-trivial FDs
• If you fix the values for one set of attributes, then the values in certain other attributes are independent of all the other attributes in the relation
Multivalued Dependencies (MVDs)
• Let R be a relation schema and let R and R.
The multivalued dependency
holds on R if in any legal relation r(R), for all pairs for tuples t1 and t2 in r such that t1[ ] = t2 [ ], there exist
tuples t3 and t4 in r such that:
t1[ ] = t2 [ ] = t3 [ ] = t4 [ ]
t3[ ] = t1 [ ]
t3[R – ] = t2[R – ]
t4 [ ] = t2[ ]
t4[R – ] = t1[R – ]
MVD (Cont.)• Tabular representation of
107
Formal Definition of MVD
• The MVD
A1A2….An B1B2…Bm
holds for a relation R if
for each pair of tuples t & u that agree on As, we can find a tuple v that agrees
1. With t & u on As
2. With t on Bs
3. With u on all attributes of R that are not among As & Bs
108
MVD
t
v
A’s B’sA B
Others
u
19
109
• 4NF
• 5NF
• 6NF
• DKNF
110
• Fourth Normal Form(4NF)– Eliminates data redundancy caused by Multi-valued
dependencies. (MVD)
– A given relations in 4NF may not contain more than one
multi-valued dependency.
111
• MVD?
Multi-value Dependencies (XY) hold
in a relation R if when ever we have two
tuples of R that agree on all the attributes
of X, then we can swap their Y
components and get two tuples that are
also in R.
112
• Example
• In Relation R(A,B,C) how can we find if
AB
• If the relation has two tuples
A
1
1
B
7
3
C
4
2
Then that table should also contain
two other tuples where B’s are
swapped.
Do this for all tuples that have the
same A values.1
1
3
7
4
2
113
• What is so bad about having a table with
multiple multi-valued dependencies?
• Example: Consider R(Departments, Jobs , Resources Used)
The table has the following MVDs department Parts
department Jobs
114
• Department d1 works on jobs j1, and j2 with parts p1 and p2• Department d2 works on jobs j3, j4, and j5 with parts p2 and p4• Department d3 works on job j2 only with parts p5 and p6.
Department Job Part#-------------------------------------------------
d1 j1 p1 d1 j1 p2 Department Job
d1 j2 p1 d1 j2 p2 d2 j3 p2 Department Part
d2 j3 p4 d2 j4 p2 d2 j4 p4 d2 j5 p2d2 j5 p4d3 j2 p5d3 j2 p6
20
115
• If you want to add a part to a department, you must create more than one new row.
• Likewise, to remove a part or a job from a row can destroy information.
• Updating a part or job name will also require multiple rows to be
changed.
• The solution is to split this table into two tables, one with
(department, projects) in it and one with (department, parts) in it.
**Only desirable MVD is the ones whose determinant is a super key of R.
Special Case: Assume R has the following two-multi value dependencies:
A B and B C
In this case R will be in the fourth normal form iff B and C are dependent on each other. 116
A relation R is in 5NF if for all join dependencies at least
one of the following holds.
(a) (R1, R2, ..., Rn) is a trivial join-dependency.
(b) Every Ri is a candidate key for R.
117
• A table is said to be in the 5NF iff it is in
4NF and every join dependency in it is
implied by the candidate keys.• Sometimes its impossible to break the table into 2
tables, that is when you can use the rules of 5NF
to normalize.
• Generally a table in 4th NF is always in 5th NF, but
sometimes real world constraint will cause the
Relation to be not comply with 5th NF.
118
• Join Dependencies: They are basically
generalization of MVD.
• A condition where the natural join of all its
projections results in the reconstruction of
R.
• If such a condition is present then that
relation should be replaced with the
tables that consist of its projections.
119
The psychiatrist is able
to offer reimbursable
treatment to patients who
suffer from the given
condition and who are
insured by the given
insurer. Psychiatrist-to-
Insurer-to-Condition is
necessary in order to
model the situation
correctly.
120
• Suppose, however, that the following rule
applies: When a psychiatrist is authorized
to offer reimbursable treatment to
patients insured by Insurer P, and the
psychiatrist is able to treat condition C,
then – in the event that the Insurer P
covers condition C – it must be true that
the psychiatrist is able to provide
treatment to patients who suffer from
condition C and are insured by Insurer P.
21
121
These are all the possible projections of the Previous table. And
if (R1 |X| R2) or (R2 |X| R3) or (R1 |X| R3) result in R then
there are MVD (4th NF), and if NJ of {R1, R2, R3} results in R
then JD exist and the original table is not in 5th NF 122
• Only in rare situations does a 4NF table
not conform to 5NF. These are situations
in which a complex real-world constraint
governing the valid combinations of
attribute values in the 4NF table is not
implicit in the structure of that table.
123
Fifth Normal Form
• A relation R is in 5NF – also called
projection-join normal form, if and only if
every nontrivial join dependency that is
satisfied by R is implied by the candidate
key(s) of R
• It is the most general form possible for
projection-based normalization
124
• DKNF offers a complete solution to the problem of avoiding modification abnormalities
• Domain/key normal form (DKNF). A key uniquely identifies each row in a table.
• By enforcing key and domain restrictions, the database is assured of being freed from any modification inconsistency.
125
• Ronald Fagin (1981) proved that if a Relation is in DKNF then it is free from any anomalies(redundancies). Including the ones caused by FDs, MVDs, JDs.
• DKNF seems simple enough then why all the hoopla about 1NF, 2NF, 3NF, BCNF, 4NF, 5NF
126
DKNF not always achievable, and there is no formal definition to verify if a relation schema is in DKNF
In short, sets of single-theme tables will most likely be in DKNF.
22
127
Denormalization
• Denormalization is said to be necessary to
improve performance
• Technically normalization is a model
concept, not related to stored files
• In practice, denormalization will speed up
some queries, and drag down others