Cs3431 Normalization. cs3431 Why Normalization? To remove potential redundancy in design Redundancy causes several anomalies: insert, delete and update

cs3431

Normalization

cs3431

Why Normalization?

To remove potential redundancy in design Redundancy causes several anomalies: insert,

delete and update Redundancy wastes storage, and often slows

down query processing

Examples to follow next.

cs3431

What is Normalization?

Normalization uses concept of dependencies Functional Dependencies

Technique used: Decomposition Break R (A, B, C, D) into R1 (A, B) and R2 (B, C, D)

cs3431

Insert Anomaly

sNumber sName pNumber pName

s1 Dave p1 MM

s2 Greg p2 ER

Student

Question: Could we insert any professor ?Note: We cannot insert a professor who has no students.

Insert Anomaly: We are not able to insert “valid” value/(s)

cs3431

Delete Anomaly


s1 Dave p1 MM

s2 Greg p2 ER

Student

Question: Can we delete a student and keep a professor info ?Note: We cannot delete a student that is the only student of a professor.

Note: In both cases, minimum cardinality of Professor in the correspondingER schema is 0

Student

sNumber

sName

Professor

pNumber

pName

HasAdvisor

(1,1) (0,1)

years

Delete Anomaly: We are not able to perform a delete without losing some “valid” information.

cs3431

Update Anomaly


s1 Dave p1 MM

s2 Greg p1 MM

Student

Question: Can we simply update a professor’s name ?Note: To update the name of a professor, we have to update in multiple tuples.

Student

sNumber

sName

Professor

pNumber

pName

HasAdvisor

(1,1) (0,*)

years

Note the maximum cardinality of Professor in the corresponding ER schema is *

Update Anomaly: To update a value, we have to update multiple rows. Update anomalies are due to redundancy.

cs3431

Normalization

Need a method to find such “dependencies” exist between attributes Functional dependencies

Need a method to remove such harmful dependencies, when they exist Relational decomposition

cs3431

Keys : Revisited

A key for a relation R (a1, a2, …, an) is a set of attributes, K, that together uniquely determine the values for all attributes of R.

A key is minimal: no subset of K is a key.

A superkey need not be minimal

A prime attribute: an attribute that is part of a key

cs3431

Functional Dependencies (FDs)

sNumber sName address

1 Dave 144FL

2 Greg 320FL

Student

Suppose we have the FD: sName address

That is, there is a function from sName to address

Meaning: For any two rows in the Student relation with the same value for sName, the value for address must be same.

cs3431

FD and Keys

sNumber sName address

1 Dave 144FL

2 Greg 320FL

Student

Questions : • Does a key implies functional dependencies? Which ones ?• Does a functional dependency imply keys ? Which ones ?

Observation : Any key (primary or candidate) or superkey of a relation R functionally determines all attributes of R.

Primary Key : <sNumber>FD : sName address

cs3431

Properties of FDs

Consider A, B, C, Z are sets of attributes

Reflexive (trivial FD): if A B, then A B Transitive: if A B, and B C, then A C Augmentation: if A B, then AZ BZ Union: if A B, A C, then A BC Decomposition: if A BC, then A B, A C

Note: Sound and complete inference rules for FDs

cs3431

Inferring FDs

Suppose we have : a relation R (A, B, C) and functional dependencies A B, B C, C A

Questions : What is a key for R? Should we split R into multiple relations?

We can infer A ABC, B ABC, C ABC. Hence A, B, C are all keys.

cs3431

Reasoning About FDs

An FD f is implied by a set of FDs F if f holds whenever all FDs in F hold. = closure of F is the set of all FDs that

are implied by F.

Computing closure of a set of FDs can be expensive. Size of closure is exponential in # attrs!

F

cs3431

Reasoning About FDs

Instead of computing closure F+ of a set of FDs Too expensive

Typically, we just need to know if a given FD X Y is in closure of a set of FDs F.

Algorithm for efficient check: Compute attribute closure of X (denoted ) wrt F:

Set of all attributes A such that X A is in There is a linear time algorithm to compute this.

Check if Y is in X+ . If yes, then X A in F+.

X

F

cs3431

Reasoning About FDs (Contd.)

Does F = {A B, B C, C D E } imply A E?

Question : i.e, is A E in the functional set closure ?

Equivalent Question : Is E in the attribute closure ?

A

F

cs3431

Algorithm for Inference of FDs

Computing the closure of set of attributes {A1, A2, …, An}, denoted {A1, A2, …, An}+

1. Let X = {A1, A2, …, An}

2. If there exists a FD : B1, B2, …, Bm C, such that every Bi X, then X = X C

3. Repeat step 2 until no more attributes can be added.

4. {A1, A2, …, An}+ = X

cs3431

Inferring FDs: Example 1

Consider R (A, B, C, D, E) with FDs A B, B C, CD E Does A E? (Is A E in F+ ?)

Rephrase as : Is E in A+ ? Let us compute {A}+

{A}+ = {A, B, C} Therefore, A E is false

cs3431

Inferring FDs: Example 2

Given R (A, B, C), and FDs : A B, B C, C A What are possible keys for R ?

Compute the closure of attributes: {A}+ = {A, B, C} {B}+

= {A, B, C} {C}+

= {A, B, C}

So keys for R are <A>, <B>, <C>

cs3431

Decomposing Relations


s1 Dave p1 MM

s2 Greg p2 MM

StudentProf

FDs: pNumber pName

sNumber sName pNumber

s1 Dave p1

s2 Greg p2

Student

pNumber pName

p1 MM

p2 MM

Professor

cs3431

Decomposition

Decomposition:

Must be Lossless (no spurious tuples)

cs3431

Decomposition: Lossless Join Property

sNumber sName pName

S1 Dave MM

S2 Greg MM

Student

pNumber pName

p1 MM

p2 MM

Professor


s1 Dave p1 MM

s1 Dave p2 MM

s2 Greg p1 MM

s2 Greg p2 MM

StudentProf

SpuriousTuples

cs3431

Normalization

How do I decide if I need to further decompose?

Once decided, what is the algorithm for decomposing?

Documents

Cs3431 Normalization. cs3431 Why Normalization? To remove potential redundancy in design Redundancy causes several anomalies: insert, delete and update