74
Database Design & Schema Refinement Professor Navneet Goyal Department of Computer Science & Information Systems BITS, Pilani

Database Design & Schema Refinement

Embed Size (px)

Citation preview

Page 1: Database Design & Schema Refinement

Database Design & Schema Refinement

Professor Navneet GoyalDepartment of Computer Science & Information SystemsBITS, Pilani

Page 2: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Topics Database Design Steps Redundancy Schema Refinement

Minimizing Redundancy Functional Dependencies (FDs) Normalization using FDs

First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Boyce-Codd Normal Form (BCNF)

Page 3: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Database Design Steps Requirements Analysis Conceptual Modeling (ER Model) Logical Modeling (Relational Model) Schema Refinement (Normalization)

Page 4: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Redundancy Same information at many places in

the DB Problems:

Wastage of Space Update Anomalies

• Update Anomaly• Insert Anomaly• Delete Anomaly

Normalization is used for “minimizing” redundancy

Page 5: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Update AnomaliesConsider the relation:EMP_PROJ ( Emp#, Proj#, Ename, Pname, No_hours) Update Anomaly: Changing the name of project

number P1 from “Billing” to “Customer-Accounting” may cause this update to be made for all 100 employees working on project P1

Insert Anomaly: Cannot insert a project unless an employee is assigned to it

Inversely - Cannot insert an employee unless he/she is assigned to a project.

Page 6: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Update AnomaliesConsider the relation:EMP_PROJ ( Emp#, Proj#, Ename, Pname, No_hours) Delete Anomaly: When a project is deleted, it will

result in deleting all the employees who work on that project. Alternately, if an employee is the sole employee on a project, deleting that employee would result in deleting the corresponding project

Page 7: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

SolutionDecompose the relation:EMP_PROJ ( Emp#, Proj#, Ename, Pname, No_hours)Into the following smaller relations:EMP (Emp#, Ename)PROJ (Proj#, Pname)EMP_PROJ ( Emp#, Proj#, No_hours) What happened to update anomalies? We need to find out the basis for

decomposing a relation to get rid of update anomalies

Page 8: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Redundancy Integrity constraints, in particular

functional dependencies, can be used to identify schemas with such problems and to suggest refinements.

Main refinement technique: decomposition (replacing ABCD with, say, AB and BCD, or ACD and ABD).

Decomposition should be used judiciously: Is there a reason to decompose a relation? What problems (if any) does the decomposition

cause?

Page 9: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Functional Dependencies Constraints on the set of legal

relations Require that the value for a certain

set of attributes determines uniquely the value for another set of attributes

A functional dependency is a generalization of the notion of a key

Page 10: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Functional Dependencies A functional dependency X Y holds over relation R

if, for every allowable instance r of R: t1 r, t2 r, (t1) = (t2) implies (t1) = (t2) i.e., given two tuples in r, if the X values agree, then the Y

values must also agree. (X and Y are sets of attributes.) An FD is a statement about all allowable instances of a

relation Must be identified based on semantics of application. Given some allowable instance r1 of R, we can check if it

violates some FD f, but we cannot tell if f holds over R! K is a candidate key for R means that K R

However, K R does not require K to be minimal!

X X YY

Page 11: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Let R be a relation schema R and R

The functional dependency holds on R if and only if for any legal relations r(R), whenever any two tuples t1 and t2 of r agree on the attributes , they also agree on the attributes . That is, t1[] = t2 [] t1[ ] = t2 [ ]

Example: Consider r(A,B ) with the following instance of r.

On this instance, A B does NOT hold, but B A does hold.

1 41 53 7

Functional Dependencies

Page 12: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Functional Dependencies

t

u

A’s B’s

If t & u agree here

Then they must

agree here

A B

Page 13: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

K is a superkey for relation schema R if and only if K R K is a candidate key for R if and only if

K R, and for no K, R

Functional dependencies allow us to express constraints that cannot be expressed using superkeys. Consider the schema:bor_loan = (customer_id, loan_number, amount )We expect this functional dependency to hold:loan_number amountbut would not expect the following to hold: amount customer_name

Functional Dependencies

Page 14: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

A functional dependency is trivial if it is satisfied by all instances of a relation Example:

• customer_name, loan_number customer_name

• customer_name customer_name In general, is trivial if

Functional Dependencies

Page 15: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Consider the relation:PLOTS (prop#, state, plot#, area, price, Tax_rate)

Information about plots available in India. The constraints on the relation are: Prop# is unique throughout India Plot# are unique within a given state For a given_state, tax_rate is fixed Plots having the same area have the same price, irrespective

of the state in which they are located Write all the FDs on the relation

PLOTS

Functional Dependencies

Page 16: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Functional DependenciesPLOTS

Prop# State Plot# Area Price Tax_rate

FD1 PK

FD2 CK

FD3

FD4

Identify redundancy in PLOTSIdentify update anomalies in PLOTS

Page 17: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Functional DependenciesPLOTS

FD1 PK

FD2 CK

Plot#StateProp# Area

PriceAreaFD4

Tax_rateFD3

State

Page 18: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Normal Forms based on PK 2 NF 3 NF

Normal Forms based on CKs Boyce-Codd Normal Form (BCNF)

Other Normal Forms 4 NF (Multivalued Dependencies) 5 NF (Join Dependencies) Deal with very rare practical situations

Normal Forms

Page 19: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Based on the concept of Full FDs (FFD) If A & B are sets of attributes of R, B is said to

be FFD on A if AB, but no proper subset of A determines B

No partial dependencies on the PK Is PLOTS in 2NF? YES Single attribute PK All relations with single attribute PK are in 2

NF!! 2 NF applies to relations with composite keys

2 NF

Page 20: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

A relation that is in 1NF & every non-PK attribute is fully functionally dependent on the PK, is said to be in 2 NF

1 NF

2 NF

2 NFRemove all

Partial Dependencies

Page 21: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Based on the concept of transitive dependency

No non-PK attribute should be transitively dependent on the PK

Transitive DependencyIf AB & BC, then A transitively determines C through B, provided B & C do not determine A

Is PLOTS in 3NF? NO

3 NF

Page 22: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

3 NFPLOTS

Prop# State Plot# Area Price Tax_rate

FD1 PK

FD2 CK

FD3

FD4Prop# transitively determines tax_rate through stateProp# transitively determines price through area

Page 23: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

A relation that is in 1NF & 2 NF & no non-PK attribute is transitively dependent on the PK, is said to be in 3 NF

2 NF

3 NF

3 NFRemove all

Transitive Dependencies

Page 24: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Based on FDs that take into account all candidate keys of a relation

For a relation with only 1 CK, 3NF & BCNF are equivalent

A relation is said to be in BCNF if every determinant is a CK

Is PLOTS in BCNF? NO

BCNF

Page 25: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Consider the relation R(A,B,C) with functional dependencies ABC and CB.

• Is R in 2NF?• Is R in 3NF?• Is R in BCNF?

Problem 1

Page 26: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

For the relation R (A,B,C,D), the Functional Dependencies are AB, AC, AD, & BA.

Find the candidate keys of R List transitive dependencies in R

(assume any CK as PK) Find the highest current normal

form of R

Problem 2

Page 27: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Closure of a set of FDs Given a set of FDs F on a relation R, it may

be possible that several other FDs must also hold for R

For Example, R=(A,B,C) & FDs, AB & BC hold in R, then FD AC also holds on R

For a given value of A, there can be only one corresponding value of B, & for that value of B, there can be only one corresponding value for C

The closure of F is the set of all FDs that can be inferred from F, & is denoted by F+

Page 28: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Equivalent Set of FDs Two sets of FDs, S & T, are

equivalent if the set of relation instances satisfying S is exactly the same as the set of relation instances satisfying T

S follows from T if every relation instance that satisfies T also satisfies all FDs in S

S & T are equivalent iff S follows from T, & T follows from S

Page 29: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Trivial, Non-trivial & Completely Non-trivial FDsAB Trivial

If B’s are a subset of the A’s Non-trivial

If atleast one of the B’s is not among A’s

Completely Non-trivialIf none of the B’s is also one of the A’s

Page 30: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Trivial Dependency RuleThe FD A1A2A3…AnB1B2B3…Bm

is equivalent to A1A2A3…AnC1C2C3…Ck

where the C’s are all those B’s that are not A’s

Page 31: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Closure of a set of FDs It is not suff. to consider just the given set of

FDs We need to consider all FDs that hold Given F, more FDs can be inferred Such FDs are said to be logically implied by F F+ is the set of all FDs logically implied by F We can compute F+using formal defn. of FD If F were large, this process would be

lengthy & cumbersome Axioms or Rules of Inference provide simpler

technique Armstrong;s Axioms

Page 32: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Inference Rules for FDsArmstrong's inference rules:IR1. (Reflexive) If Y X, then X YIR2. (Augmentation) If X Y, then XZ YZ

(Notation: XZ stands for X U Z)IR3. (Transitive) If X Y and Y Z, then X Z

IR1, IR2, IR3 form a sound & complete set of inference rules

Never generates any wrong FD

Generate all FDs that hold

Page 33: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Some additional inference rules that are useful:

IR4: Decomposition: If XYZ, then XY & XZIR5: Union: If XY & XZ, then XYZIR6: Psuedotransitivity: If XY & WYZ,then WXZ Above three inference rules, as well as any

other inference rules, can be deduced from IR1, IR2, and IR3 (completeness property)

Prove all the six rules (IR1 – IR6) – Use defn. of FD & either by direct proof or proof by contradiction

Inference Rules for FDs

Page 34: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Inference Rules for FDsIR1. (Reflexive) If Y X, then X YProof: Y X & t1 & t2 Є some instance r of R э

t1[X]=t2[X], then t1[Y]=t2[Y] because Y X.IR2. (Augmentation) If X Y, then XZ YZProof by contradiction: Assume XY holds but

XZYZ does not. Then there must exist 2 tuples t1 & t2 э 1. t1[X]=t2[X], 2. t1[Y]=t2[Y]

3. t1[XZ]=t2[XZ] & 4. t1[YZ]≠t2[YZ]Not possible because from 1 & 3 we deduce 5. t1[Z]=t2[Z], & from 2 & 5 we deuce 6. t1[YZ]=t2[YZ], contradicting 4

Page 35: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Example R = (A, B, C, G, H, I)

F = { A B A CCG HCG I B H}

some members of F+

A H • by transitivity from A B and B H

AG I • by augmenting A C with G, to get AG CG

and then transitivity with CG I CG HI

• By union rule

Page 36: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Procedure for Computing F+

To compute the closure of a set of functional dependencies F:

F + = Frepeatfor each functional dependency f in F+

apply reflexivity and augmentation rules on f add the resulting functional dependencies to F +for each pair of functional dependencies f1and f2 in F + if f1 and f2 can be combined using transitivity then add the resulting functional dependency to F +until F + does not change any further

NOTE: We shall see an alternative procedure for this task later

Page 37: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Closure of Attribute Sets Set of attributes functionally determined by X Closure of a set of attributes X with respect to F

is the set X+ of all attributes that are functionally determined by X

Algo. for computing closure: compute F+ & take all FDs with X on the LHS & take union of the RHS of all such FDs

X+ can be calculated by repeatedly applying IR1, IR2, IR3 using the FDs in F

Both these approaches become cumbersome if F is large & consequently F+ is larger

Page 38: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Closure of Attribute Sets Given a set of attributes define the closure of under F

(denoted by +) as the set of attributes that are functionally determined by under F

Algorithm to compute +, the closure of under F

result := ;while (changes to result) do

for each in F dobegin

if result then result := result end

Try to find out why this algorithm works!Complexity of this algorithmCan you do any better?

Page 39: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Example of Attribute Set Closure R = (A, B, C, G, H, I) F = {A B, A C, CG H, CG I, B H} (AG)+

1. result = AG2. result = ABCG (A C and A B)3. result = ABCGH (CG H and CG AGBC)4. result = ABCGHI (CG I and CG AGBCH)

Is AG a candidate key? 1. Is AG a super key?

1. Does AG R? == Is (AG)+ R2. Is any subset of AG a superkey?

1. Does A R? == Is (A)+ R2. Does G R? == Is (G)+ R

Page 40: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Uses of Attribute ClosureThere are several uses of the attribute closure

algorithm: Testing for superkey:

To test if is a superkey, we compute +, and check if +

contains all attributes of R. Testing functional dependencies

To check if a functional dependency holds (or, in other words, is in F+), just check if +.

That is, we compute + by using attribute closure, and then check if it contains .

Is a simple and cheap test, and very useful Computing closure of F

For each R, we find the closure +, and for each S +, we output a functional dependency S.

Page 41: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Canonical Cover Sets of functional dependencies may have

redundant dependencies that can be inferred from the others For example: A C is redundant in: {A B, B C} Parts of a functional dependency may be redundant

• E.g.: on RHS: {A B, B C, A CD} can be simplified to {A B, B C, A D}

• E.g.: on LHS: {A B, B C, AC D} can be simplified to {A B, B C, A D}

Intuitively, a canonical cover of F is a “minimal” set of functional dependencies equivalent to F, having no redundant dependencies or redundant parts of dependencies

Page 42: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Equivalence of Sets of FDs Two sets of FDs F and G are equivalent if:

- every FD in F can be inferred from G, &- every FD in G can be inferred from F

Hence, F and G are equivalent if F+=G+

Definition: F covers G if every FD in G can be inferred from F (i.e., if G+F+)

F and G are equivalent if F covers G and G covers F

There is an algorithm for checking equivalence of sets of FDs

Page 43: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Extraneous Attributes Consider a set F of functional dependencies and the

functional dependency in F. Attribute A is extraneous in if A

and F logically implies (F – { }) {( – A) }. Attribute A is extraneous in if A

and the set of functional dependencies (F – { }) { ( – A)} logically implies F.

Note: implication in the opposite direction is trivial in each of the cases above, since a “stronger” functional dependency always implies a weaker one

Example: Given F = {A C, AB C } B is extraneous in AB C because {A C, AB C}

logically implies A C (I.e. the result of dropping B from AB C).

Example: Given F = {A C, AB CD} C is extraneous in AB CD since AB C can be inferred

even after deleting C

Page 44: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Testing if an Attribute is Extraneous

Consider a set F of functional dependencies and the functional dependency in F.

To test if attribute A is extraneous in 1. compute ({} – A)+ using the dependencies in F 2. check that ({} – A)+ contains A; if it does, A is

extraneous To test if attribute A is extraneous in

1. compute + using only the dependencies in F’ = (F – { }) { ( – A)},

2. check that + contains A; if it does, A is extraneous

Page 45: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Canonical Cover A canonical cover for F is a set of dependencies Fc such that

F logically implies all dependencies in Fc, and Fc logically implies all dependencies in F, and No functional dependency in Fc contains an extraneous

attribute, and Each left side of functional dependency in Fc is unique.

To compute a canonical cover for F:repeat

Use the union rule to replace any dependencies in F 1 1 and 1 2 with 1 1 2 Find a functional dependency with an extraneous attribute either in or in If an extraneous attribute is found, delete it from

until F does not change

Note: Union rule may become applicable after some extraneous attributes have been deleted, so it has to be re-applied

Page 46: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Computing Canonical Cover R = (A, B, C)

F = {A BC, B C, A B, AB C} Combine A BC and A B into A BC

Set is now {A BC, B C, AB C} A is extraneous in AB C

Check if the result of deleting A from AB C is implied by the other dependencies• Yes: in fact, B C is already present!

Set is now {A BC, B C} C is extraneous in A BC

Check if A C is logically implied by A B and the other dependencies• Yes: using transitivity on A B and B C.

• Can use attribute closure of A in more complex cases The canonical cover is: A B, B C

Page 47: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Problems with DecompositionsThere are three potential problems to consider: Some queries become more expensive

• e.g., What is the price of prop# 1? Given instances of the decomposed relations, we

may not be able to reconstruct the corresponding instance of the original relation! • Fortunately, not in the PLOTS example• How we could say this?

Checking some dependencies may require joining the instances of the decomposed relations.• Fortunately, not in the PLOTS example• How we could say this?

Tradeoff: Must consider these issues vs. redundancy

Page 48: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Lossy Decomposition

A B C1 2 34 5 67 2 81 2 87 2 3

A B C1 2 34 5 67 2 8

A B1 24 57 2

B C2 35 62 8

JOINSpurious Tuples

Note that we can never get anythng less than the original relation

Since we don’t know which tuples are spurious and which are genuine, we have indeed lost information

Page 49: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Lossy Decomposition

S# StatusS3 30S5 30

S# CityS3 ParisS5 Athens

S# StatusS3 30S5 30

Status City30 Paris30 Athens

S# Status CityS3 30 ParisS5 30 Athens

1

2

Page 50: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Lossless Decomposition Observe that S satisfies the FDs:

S# Status & S# City It can not be a coincidence that S is equal to

the join of its projections on {S#, Status} & {S#, City}

Heaths’ Theorem:Let R{A,B,C} be a relation, where A, B, & C are sets of attributes. If R satisfies AB & AC, then R is equal to the join of its projections on {A,B} & {A,C}

Observe that in 2 the FD, S# City is lost

Page 51: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Lossless Decomposition

The decomposition of R into R1, R2, …Rn is lossless if for any instance r of R r = R1 (r ) R2 (r ) …… Rn (r )

We can replace R by R1 & R2, knowing that the instance of R can be recovered from the instances of R1 & R2

We can use FDs to show that decompositions are lossless

Page 52: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Lossless Decomposition

TheoremA decomposition of R into R1 and R2 is lossless join wrt FDs F, if and only if at least one of the following dependencies is in F+:

• R1 R2 R1• R1 R2 R2

In other words, R1 R2 forms a superkey of either R1 or R2

Page 53: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Dependency Preservation Let Fi be the set of dependencies

in F + that include only attributes in Ri.

• A decomposition is dependency preserving, if

(F1 F2 … Fn )+ = F +• If it is not, then checking updates for

violation of functional dependencies may require computing joins, which is expensive.

Page 54: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Testing for Dependency Preservation

To check if a dependency is preserved in a decomposition of R into R1, R2, …, Rn we apply the following test (with attribute closure done with respect to F) result =

while (changes to result) dofor each Ri in the decomposition

t = (result Ri)+ Ri

result = result t If result contains all attributes in , then the functional dependency

is preserved. We apply the test on all dependencies in F to check if a

decomposition is dependency preserving This procedure takes polynomial time, instead of the exponential

time required to compute F+ and (F1 F2 … Fn)+

Page 55: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Example R = (A, B, C )

F = {A B, B C}Key = {A}

R is not in BCNF Decomposition R1 = (A, B), R2 = (B, C)

R1 and R2 in BCNF Lossless-join decomposition Dependency preserving

Page 56: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

4 NF BCNF removes any anomalies due to FDs Further research has led to the

identification of another type of dependency called Multi-valued Dependency (MVD)

Proposed by R Fagin* in 1977 MVDs can also cause data redundancy MVDs are a generalization of FDs

* R Fagin: “Multi-valued Dependencies & a new normal form for relational databases,” ACM TODS2, No. 3 (Sept. 1977)

Page 57: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

4 NF Consider the following relation HCTX:

In relational databases, repeating groups are not allowed

Course Teacher TextsDBS N Goyal

J P MisraGarciaRaghu

ADBS J P Misra ConnollyGarcia

Page 58: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

4 NF 1 NF Version

COURSE TEACHER TEXTS

DBS N GOYAL GARCIADBS N GOYAL RAGHU RDBS J P MISRA GARCIADBS J P MISRA RAGHU RADBS J P MISRA GARCIAADBS J P MISRA CONNOLLY

CTX

NO FDs in this relation

Page 59: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

4 NF Highest Normal Form?

COURSE TEACHER TEXTS

DBS N GOYAL GARCIADBS N GOYAL RAGHU RDBS J P MISRA GARCIADBS J P MISRA RAGHU RADBS J P MISRA GARCIAADBS J P MISRA CONNOLLY

CTXBCNF?

Page 60: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

4 NF Anomalies?

COURSE TEACHER TEXTS

DBS N GOYAL GARCIADBS N GOYAL RAGHU RDBS J P MISRA GARCIADBS J P MISRA RAGHU RADBS J P MISRA GARCIAADBS J P MISRA CONNOLLY

CTXMANY!!

Page 61: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

4 NFAnomalies New Teacher for DBS New Text for ADBS Teacher teaching DBS leaves

Page 62: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

4 NFPoints to note: If (c,t1,x1), (c,t2,x2) both appear, then

(c,t1,x2), (c,t2,x1) will also appear. Teachers and texts are completely independent of

one another CTX has no FDs at all CTX is in BCNF Any all key relation must necessarily be in BCNF!! But still there is a need to normalize CTX

Page 63: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

4 NFDecompose CTX into CT & TX

COURSE TEACHERDBS N GOYALDBS J P MISRAADBS J P MISRA

COURSE TEXTDBS GARCIADBS RAGHU RADBS GARCIAADBS CONNOLLY

CT TX

Page 64: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

4 NF Decompose CTX into CT & TX is not done

on the basis of FDs (as there are no FDs) Decompose CTX into CT & TX is done on

the basis of MVDs MVDs

Represents a dependency between attributes of a relation, such that for every value of A, there is a set of values of B & a set of values of C, The set of values for B & C are independent of each other

course teacher (course multi-determines teacher)

course text (text multi-dependent on course)

Page 65: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

4 NF Interpretation of course teacher

Although a course does not have a single corresponding teacher, i.e. the FD course teacher does not hold

Still each course must have a ‘well defined’ set of teachers

For a given course c and a given text x, the set of teachers t matching the pair (c,x) depends on value of c alone

It makes no difference which particular value of x we choose

Interpret course text analogously

Page 66: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

4 NFFormal Definition Let R be a relation and A,B,C be subsets of attributes

of R, then we say that A Biff, in every possible legal value of R, the set of B values matching a given (A,C) pair depends only on the value of A and is independent of the C value.

It can be easily shown that for R(A,B,C), the MVD A B hold iff the MVD A C also holds.

MVDs always go together in pairs and we write them asA B | Ccourse teacher | text

Page 67: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

4 NFFagin Theorem Let R(A,B,C) be a relation where A,B,C and

be subsets of attributes of R, then R is equal to the join of its projections on {A,B} and {A,C} iff R satisfies the MVDA B | C

Page 68: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

4 NF An MVDs A B is trivial if

(a) B A or (b) A U B = R

A relation that is in BCNF & contains no non-trivial MVDs is said to be in 4NF

CTX is not in 4NF because course teacher is a non trivial MVD

Page 69: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Multi-Valued Dependencies Most common source of redundancy

in BCNF schemas is to put 2 or more M:M relationships in a single relation

Page 70: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Formal Definition of MVD The MVD

A1A2….An B1B2…Bmholds for a relation R iffor each pair of tuples t & u that agree on As, we can find a tuple v that agrees

1. With t & u on As2. With t on Bs3. With u on all attributes of R that are

not among As & Bs

Page 71: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

MVD

t

v

A’s B’sA B

Others

u

Page 72: Database Design & Schema Refinement

© Prof. Navneet Goyal, BITS, Pilani

Problem Solving Consider a relation R (A,B,C,D,E,F)

with the following FDs:F = {ABC, BCAD, DE, CFB}

(a) Find out whether AB is a key of R or not.(b) Use the result of part (a) to find out

whether ABD is implied by F

AB+={ABCDE}

If D is in AB+, then ABD is implied by F

Page 73: Database Design & Schema Refinement

Q & A

Page 74: Database Design & Schema Refinement

Thank You