prev

next

out of 35

View

229Download

0

Embed Size (px)

DESCRIPTION

NIE Normalization

Transcript

Database Systems

Normalization

GCE (A/L) ICT Training for Teachers 1

Contact Persons

Name : Buddhika H. Kasthuriarachchy

Email : buddhika.h@sliit.lk

Phone : 0112 413900 ext: 4301

Mobile : 0773607507

GCE (A/L) ICT Training for Teachers 2

Recommended Reading

https://sites.google.com/site/ictalnie/

Google user name : ictalnie2011

Password : ictalniepython

Fundamentals of Database Systems (5th

Edition) Ramez Elmasri /Shamkant B. Navathe

Database Management Systems (2nd

Edition) - Raghu Ramakrishna

/Johannes Gehrke

GCE (A/L) ICT Training for Teachers 3

Introduction

Conceptual Modeling is a subjective process

Therefore, the schema after the logical database design phase may not be very good (contain redundancies)

However, there are formalisms to ensure that the schema is good.

This process is called Normalization

GCE (A/L) ICT Training for Teachers 4

Relational database schema = set of relations

Relation = set of attributes

How we group the attributes to relations is very important

GCE (A/L) ICT Training for Teachers 5

Too many attributes in a relation Waste space

Anomalies

Decomposing the relation into too smaller set of relations

Loss-less join property

Dependency preserving property

GCE (A/L) ICT Training for Teachers 6

Too many attributes

For example,

LECTURER(id, name, address, salary,

deptno,dname building)

GCE (A/L) ICT Training for Teachers 7

Insertion Anomaly

1. Inserting a new lecturer to the

LECTURER table

- Department information is repeated

(ensure that correct department

information is inserted).

2. Inserting a department with no

employees

(Impossible b/c null values for id is not allowed)

GCE (A/L) ICT Training for Teachers 8

Deletion Anomalies

Deleting the last lecturer from the department will lose information about

the department

GCE (A/L) ICT Training for Teachers 9

Update Anomalies

Updating the departments building needs to be done for all lecturers

working for that department

GCE (A/L) ICT Training for Teachers 10

When redundancies exists, we should decompose the relations to smaller

relations

Loss-less join property: we might lose information if we decompose relations

Dependency-preserving property: The set of dependencies in S can be

verified by a set of dependencies in R1and R

GCE (A/L) ICT Training for Teachers 11

Loss-less join property:

For example,

GCE (A/L) ICT Training for Teachers 12

S P D

S1 P1 D1

S2 P2 D2

S3 P1 D3

S P

S1 P1

S2 P2

S3 P1

P D

P1 D1

P2 D2

P1 D3

S R1 R2

Joining them together, we get spurious tuples

GCE (A/L) ICT Training for Teachers 13

S P D

S1 P1 D1

S1 P1 D3

S2 P2 D2

S3 P1 D1

S3 P1 D3

R1 R2

To avoid the above mentioned issues in the relational schema, we can apply

a formal process called Normalization

Normalization is based on functional dependencies

GCE (A/L) ICT Training for Teachers 14

A functional dependency, denoted by X Y, where X and Y are sets of attributes in relation R, specifies the following constraint:

Let t1 and t2 be tuples of relation R for any given instance

Whenever t1[X] = t2[X] then t1[Y] = t2[Y]

where ti[X] represents the values for X in tuple ti

GCE (A/L) ICT Training for Teachers 15

Key points:

Redundancy is based on functional dependencies

Therefore, normalization is based on functional dependencies

GCE (A/L) ICT Training for Teachers 16

Given some FDs, we can usually infer additional FDs:

A B, B C implies A C

An FD f is implied by a set of FDs F if f holds whenever all FDs in F hold.

F+ = closure of F is the set of all FDs that are implied by F.

How can we get F+?

GCE (A/L) ICT Training for Teachers 17

Armstrongs Axioms (X, Y, Z are sets of attributes):

Reflexivity: If X Y, then Y X

Augmentation: If X Y, then XZ YZ for any Z

Transitivity: If X Y and Y Z, then X Z

These are sound and complete inference rules for FDs!

GCE (A/L) ICT Training for Teachers 18

Couple of additional rules (that follow from AA):

Union: If X Y and X Z, then X YZ

Decomposition: If X YZ, then X Y and X Z

Example: Contracts(cid,sid,jid,did,pid,qty,value), and:

C is the key: C CSJDPQV

Project purchases each part using single contract: JP C

Dept purchases at most one part from a supplier: SD P

JP C, C CSJDPQV imply JP CSJDPQV

SD P implies SDJ JP

SDJ JP, JP CSJDPQV imply SDJ CSJDPQV

GCE (A/L) ICT Training for Teachers 19

Why is F+ important?

X RHS in relation R

X is a subset of attributes in relation R. If RHScontains all attributes of R, then X is a superkey.

If X is not a superkey, then values for X can repeat in different tuples resulting in redundancy!!!

So determining F+ can help us find superkeys and check for any redundancy.

GCE (A/L) ICT Training for Teachers 20

Computing the closure of a set of FDs can be expensive. (Size of closure is exponential in # attrs!)

Typically, we just want to check if a given FD X Y is in the closure of a set of FDs F+. An efficient

check:

Compute attribute closure of X (denoted X+) wrt F:

Set of all attributes A such that X A is in F+

There is a linear time algorithm to compute this.

Check if Y is in X+

GCE (A/L) ICT Training for Teachers 21

Algorithm to find X+:

closure = X;

repeat until there is no change: {

If there is an FD U V in F such that U closure

then set closure = closure V

}

Does F = {A B, B C, CD E } imply A E?

i.e, is A E in the closure F+? Equivalently, is E in A+?

We can use the attribute closure to find out keys of the relation. If X+ contains all attributes of the relation, then X is a superkey.

GCE (A/L) ICT Training for Teachers 22

Schema Refinement Steps:

Determine F for relation R

Find all keys in F using attribute closure

Normalize

GCE (A/L) ICT Training for Teachers 23

There are many Normal Forms proposed to reduce redundancies

Some of the well-known ones are:

1st Normal Form

2nd Normal Form

3rd Normal Form

Boyce-Codd Normal Form

GCE (A/L) ICT Training for Teachers 24

Lossless join decomposition: Decomposition of R into X and Y is lossless-join

w.r.t. a set of FDs F if, for every instance r that satisfies F:

X(r) Y (r) = r

TheoremThis condition holds if attributes common to X

and Y contains a key for either X or Y

We can find a lossless join decomposition for 1st NF, 2nd NF, 3rd NF and BCNF (will see later)

GCE (A/L) ICT Training for Teachers 25

Dependency preserving property:

A relation R with a set of functional dependencies F, is decomposed into relations X and Y are said

to be dependency preserving iff F+ = (Fx FY)+

That is, a dependency-preserving decomposition allows us to enforce all FDs by examining a single

relation instance.

We can always obtain a dependency preserving decomposition for 1st NF, 2nd NF and 3rd NF. Not

necessarily for BCNF (will see later)

GCE (A/L) ICT Training for Teachers 26

Review of some terms

Superkey: Set if attributes S in relation R such that no two distinct tuples t1 and

t2 will have t1[S] = t2[S]

Key: A key is a superkey with the additional property that removal of any

attributes from the key will not satisfy

the key condition

GCE (A/L) ICT Training for Teachers 27

Candidate Key: Each key of a relation is called a candidate key

Primary Key: A candidate key is chosen to be the primary key

Prime Attribute: an attribute which is a member of a candidate key

Nonprime Attribute: An attribute which is not prime

GCE (A/L) ICT Training for Teachers 28

1st Normal Form

A relation R is in first normal form (1NF) if domains of all attributes in the relation are atomic (simple & indivisible).

GCE (A/L) ICT Training for Teachers 29

2nd Normal Form:

A relation R is in second normal form (2NF) if every nonprime attribute A in R

is not partially dependent on any key of

R

GCE (A/L) ICT Training for Teachers 30

Example

EMP_PROJ

GCE (A/L) ICT Training for Teachers 31

NIC PNUM HOURS ENAME PNAME LOC

FD1

FD2

FD3

GCE (A/L) ICT Training for Teachers 32

NIC PNUM HOURS

NIC ENAME

PNUM PNAME PLOC

EP1

EP2

EP3

3rd Normal Form:

A relation R is in 3rd normal form (3NF) if every

R is in 2NF, and

No nonprime attribute is transitively dependent on any key

GCE (A/L) ICT Training for Teachers 33

Example,

EMP_DEPT

GCE (A/L) ICT Training for Teachers 34

ENAME SSN BDATE ADD DNUM DNAME DMGR

GCE (A/L) ICT Training for Teachers 35

ED1

ED2

ENAME SSN BDATE ADD DNUM

DNUM DNAME DM