57
Chapter 7 Normalization Normalization Chapter 14 & 15 in Textbook

Chapter 7 Normalization Chapter 14 & 15 in Textbook

Embed Size (px)

Citation preview

Chapter 7

NormalizationNormalization

Chapter 14 & 15 in Textbook

2

Database Design

Steps in building a database for an application:

Real-world domain

Conceptualmodel

DBMS data model

Create Schema

(DDL)

Modify data (DML)

Normalization

3

How to produce a good relation How to produce a good relation schema?schema?

1. Start with a set of relation.

2. Define the functional dependencies for the relation to specify the PK.

3. Transform relations to normal form.

Normalization

4

Data RedundancyData Redundancy

SL21

SG37

SG14

SA9

SG5

StaffNo

John

Ann

David

Mary

Susan

FName

White

Beech

Ford

Howe

Brand

LName position

Manager

Assistant

Supervisor

Assistant

Manager

Salary

30000

12000

18000

9000

24000

BrnNo

B005

B003

B003

B007

B003

City

London

Glasgow

Glasgow

Aberdeen

Glasgow

SL41 Julie Lee Assistant 9000 B005 London

Address

22 Deer Rd

163 Main St

16 Arglly St

22 Deer Rd

163 Main St

163 Main St

Relations that have redundant data may have update anomalies (insert, modify, delete)

STAFFBRANCH

B003 Glasgow163 Main St

B003 Glasgow163 Main St

B003 Glasgow163 Main St

Normalization

5

SL21

SG37

SG14

SA9

SG5

StaffNo

John

Ann

David

Mary

Susan

FName

White

Beech

Ford

Howe

Brand

LName position

Manager

Assistant

Supervisor

Assistant

Manager

Salary

30000

12000

18000

9000

24000

SL41 Julie Lee Assistant 9000

BrnNo

B005

B003

B007

City

London

Glasgow

Aberdeen

Address

22 Deer Rd

163 Main St

16 Arglly St

STAFF

BRANCH

BrnNo

B005

B005

B003

B003

B003

B007

Normalization

6

Relation DecompositionRelation Decomposition

Normalization process involve decomposing a relation.

Decomposition require to be reversible.

Functional dependencies guarantee decomposition to be reversible.

While normalization, two important properties associated with decomposition:

1. Lossless-join

2. Dependency preservation

Normalization

7

SL21

SG37

SG14

SA9

SG5

StaffNo

John

Ann

David

Mary

Susan

FName

White

Beech

Ford

Howe

Brand

LName position

Manager

Assistant

Supervisor

Assistant

Manager

Salary

30000

12000

18000

9000

24000

SL41 Julie Lee Assistant 9000

BrnNo

B005

B003

B007

City

London

Glasgow

London

Address

22 Deer Rd

163 Main St

16 Arglly St

STAFF

BRANCH

City

London

London

Glasgow

Glasgow

London

Glasgow

Normalization

8

Data RedundancyData Redundancy

SL21

SG37

SG14

SA9

SG5

StaffNo

John

Ann

David

Mary

Susan

FName

White

Beech

Ford

Howe

Brand

LName position

Manager

Assistant

Supervisor

Assistant

Manager

Salary

30000

12000

18000

9000

24000

BrnNo

B005

B005

City

London

London

SL41 Julie Lee Assistant 9000 B005 London

Address

22 Deer Rd

22 Deer Rd

22 Deer Rd

STAFFBRANCH

B003 Glasgow163 Main St

B003 Glasgow163 Main St

B003 Glasgow163 Main St

SL21 John White Manager 30000 LondonB007 16 Arglly St

SA9 Mary Howe Assistant 9000 B007 London16 Arglly St

SL41 Julie Lee Assistant 9000 B007 London16 Arglly St

Normalization

9

Functional DependenciesFunctional DependenciesDescribes the relationship between attributes in a relation.

If A and B are attributes of relation R,

B is functionally dependent on A, denoted by A B, if each value of A is associated with exactly one value of B. B may have several values of A.

Determinant Dependent

•Functional dependency is identifies between attributes in a relation at different times (all time functional dependency).

A BB is functionallydependent on A

Normalization

10

A B

t

u

If t & u agree here Then they must agree here

Functional DependenciesFunctional Dependencies

A B

whenever two tuples t & u agree on all attributes of A, then they must agree on attribute B.

Normalization

11

Functional Dependencies

Example

StaffNo positionPosition is functionallydependent on Staffno

position StaffNoStaffNo is NOT functionally

dependent on position

SL21 Manager

Manager SL21 SG5

1:1 or M:1 relationship

between attributes in a

relation

1:M relationship

between attributes in a

relation

Normalization

12

Trivial Functional DependenciesTrivial Functional Dependencies

A B is trivial if B A

StaffNo, Sname SName

StaffNo, SName StaffNo

We are not interested in trivial functional dependencies as it provides no genuine integrity constraints on the value held by these attributes.

Normalization

13

StaffBranch ExampleStaffBranch Example

Functional dependencies on StaffBranch relation:

StaffNo FName, Lname, position, salary, brnNo, Address, city

BranchNo Address, city

Address, city BranchNo

BranchNo, position salary

Address, city, position salary

Determinants:

StaffNo, BranchNo, (Address, city), (branchNo, position), and (address, city, position)

Normalization

14

Identifying the PKIdentifying the PKPurpose of functional dependency, specify the set of integrity constraints that must hold on a relation.

The determinant attribute(s) are candidate of the relation, if:

• 1:1 relationship between determinant & dependent.

• No subset of determinant attribute(s) is a determinant. (nontrivial)

If (A, B) C, then NOT A B, and NOT B A

• All attributes that are not part of the CK should be functionally dependent on the key: CK all attributes of R

• Hold for all time.

PK is the candidate attribute(s) with the minimal set of functional dependency.

Normalization

15

ClosureClosure

Closure (inferred from) X+: The set of functional dependencies that are implied by a given set of functional dependencies X.

A B

t

u

If t & u agree here Then they must agree here

C

So surely they will agree here

C B

X A B

X+ A C

Normalization

16

Closure ExampleClosure Example

S BranchNo (Address, city)

S+ BranchNo AddressBranchNo city

Implied by

Normalization

17

Inference Rules for Functional Inference Rules for Functional DependenciesDependencies

Armstrong’s aximos (inference rules): The set of inference rules specifies how functional dependencies can be inferred from given one.

Inference rules:

Reflexivity If B A, then A B

Augmentation If A B, then A,C B,C

Transitivity If A B and B C, then A C

Self-Determination A A

Decomposition If A B,C, then A B and A C

Union If A B and A C, then A B,C

Normalization

18

Minimal Sets of Functional Minimal Sets of Functional DependenciesDependencies

• Complete set of functional dependencies for a relation can be very large.

• We need to reduce the set to a manageable size, by applying the inference rules repeatedly until they stop producing new FDs.

Assume S1 & S2 are set of dependencies:

S1 S2, then (S2 is a cover for S1) OR (S1 is covered by S2)

if S2 is a cover for S1

& S1 is a cover for S2

S1 equivalent to S2

Normalization

19

Minimal Sets of Functional Minimal Sets of Functional DependenciesDependencies

A set of functional dependencies X is minimal if it satisfies the following:

1.Every dependency in X has a single attribute for its right-hand side.

2.Can’t replace any dependency A B in X with C B , where C A, & still have a set of dependencies equivalent to X.

3.Can’t remove any dependency from X and still have a set of dependencies that is equivalent to X.

Normalization

20

Minimal Sets of Functional Minimal Sets of Functional DependenciesDependencies

1. For each X {A1, A2, .. An}, create X A1, X A2, …., X An.

2. A, B C is equivalent to B C, then replace A, B C with B C.

3. X - {A B} equivalent to X, then remove A B.

Normalization

QuestionQuestion

Find the minimal set of the following FDs:

Fd1: B A

Fd2: D A

Fd3: A,B D

21Normalization

QuestionQuestion

Find FDs of the relation shown below that lists dentist/patient appointment data; known that:

• A patient is given an appointment at a specific time and date with a dentist located at a particular surgery.

• On each day of patient appointments, a dentist is allocated to a specific surgery for that day.

Dentist-patient (staffNo, dentistName, aDate, aTime, patNo, patName, surgeryNo)

22Normalization

23

The Purpose of NormalizationThe Purpose of Normalization

Normalization is a bottom-up approach to database design that begins by examining

the relationships between attributes. It is performed as a series of tests on a relation to

determine whether it satisfies or violates the requirements of a given normal form.

Purpose:

Guarantees no redundancy due to FDs

Guarantees no update anomalies

Normal Forms:

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

Boyce-Codd Normal Form (BCNF)

Fourth Normal Form (4NF)

Fifth Normal Form (5NF)

24

The Process of NormalizationThe Process of Normalization

Normalization is a technique for analyzing relations based on their CK & FD.

5NF

4NF

BCNF

3NF

2NF

1NF

Higher Normal Form

Strong

er in

form

at

Less

vulne

rable

to u

pdat

e an

omali

es

Normalization

25

First Normal Form (1NF)First Normal Form (1NF)

Unnormalized form (UNF): A relation that contains one or more repeating groups.

First normal form (1NF): A relation in which the intersection of each row and

column contains one & only one value.

Unnormalized relation

ClientNo

CR76

PropertyNo

PG4

Name

John Key

CLIENT_PROPERTY

PG16

PG4PG36

PG16

CR56 Aline Stewart

Normalization

26

UNF 1NFUNF 1NFApproach 1Approach 1

Expand the key so that there will be a separate tuple in the original relation for each repeated attribute(s). Primary key becomes the combination of primary key and redundant value.

1NF relation

Disadvantage: introduce redundancy in the relation.

ClientNo

CR76

PropertyNo

PG4

Name

John Key

CLIENT_PROPERTY

PG16

PG4PG36

PG16

CR56 Aline Stewart

CR76 John Key

CR56 Aline Stewart

CR56 Aline Stewart

Normalization

27

If the maximum number of values is known for the attribute, replace repeated attribute (PropertyNo) with a number of atomic attributes (PropertyNo1, PropertyNo2, PropertyNo3).

1NF relation

Disadvantage: introduce NULL values in the relation.

UNF 1NFUNF 1NFApproach 2Approach 2

ClientNo

CR76

PropertyNo1

PG4

Name

John Key

CLIENT_PROPERTY

PG16

PG4 PG36CR56 Aline Stewart

PropertyNo2 PropertyNo3

NULL

PG16

Normalization

28

UNF 1NFUNF 1NFApproach 3Approach 3

Remove the attribute that violates the 1NF and place it in a separate relation along

with a copy of the primary key.

ClientNo

CR76

Name

John Key

CLIENT

CR56 Aline Stewart

ClientNo

CR76

PropertyNo

PG4

PROPERTY

PG16

PG4PG36

PG16

CR56

CR76

CR56CR56

1NF relation

1NF relation

Normalization

29

Full Functional DependencyFull Functional Dependency

If A and B are attributes of a relation.

B is fully functionally dependent on A if B is functionally dependent on A, but not on any proper subset of A.

B is partial functional dependent on A if some attributes can be removed from A & the dependency still holds.

StaffNo, Sname BranchNo Partial dependency

ClientNo, PropertyNo RentDate Full dependency

Normalization

30

Second Normal Form (2NF)Second Normal Form (2NF)

Second normal form (2NF): A 1NF relation in which every attribute is fully

nontrivial functionally dependent on the PK. (non-prime attributes fully dependent

on PK.)

Applies to relations with composite primary keys & partial dependencies.

1NF relation

ClientNo cNamePropertyNo

CLIENT_RENTAL

pAddress RentStart RentFinish Rent OwnerNo OName

Normalization

31

1NF 2NF1NF 2NF

1. Start with 1NF relation.

2. Find the FDs of a relation.

3. Test the FDs whose determinant attribute is part of the PK.

Normalization

ClientNo cNamePropertyNo

CLIENT_RENTAL

pAddress RentStart RentFinish Rent OwnerNo OName

(ClientNo, PropertyNo) PK

ClientNo, PropertyNo RentStart, RentFinish Full DependencyClientNo CName Partial DependencyPropertyNo Paddress, Rent, OwnerNo, Oname Partial DependencyOwnerNo ONameClientNo, RentStart PropertyNo, pAddress, RentFinish, Rent, OwnerNo, OnamePropertyNo, RentStart ClientNo, cName, RentFinish

1NF 2NF1NF 2NF

32Normalization

33

1NF 2NF1NF 2NF

4. Remove partial dependencies by placing the functionally dependent attributes in

a new relation along with a copy of their determinants.

2NF relation 2NF relation

2NF relation

ClientNo cName

CLIENTClientNo PropertyNo RentStart RentFinish

RENTAL

PropertyNo

PROPERTY_OWNER

pAddress Rent OwnerNo OName

Normalization

34

Transitive DependencyTransitive Dependency

A, B, C are attributes of a relation, such that:

If A B and B C, then C is transitively dependent on A via B.

Provided A is NOT functionally dependent on B or C (nontrivial FD).

Example:

StaffNo BranchNo , BranchNo Address

StaffNo Address

Normalization

35

Third Normal Form (3NF)Third Normal Form (3NF)

Third normal form (3NF): A 2NF relation in which NO non-prime attribute is

transitively dependent on the PK.

3NF relation 3NF relation

2NF relation

ClientNo cName

CLIENTClientNo PropertyNo RentStart RentFinish

RENTAL

PropertyNo

PROPERTY_OWNER

pAddress Rent OwnerNo OName

Normalization

36

2NF 3NF2NF 3NF

1. Identify the PK in the 2NF relation.

2. Identify FDs in this relation.

3. If transitive dependencies exist, place transitively dependent attributes in a new

relation along with a copy of their determinants.

3NF relation 3NF relation

OwnerNo OName

OWNER

PropertyNo pAddress rent OwnerNo

PROPERTY_FOR_RENT

Normalization

37

Review of DecompositionsReview of Decompositions

CLIENT_RENTAL

CLIENT RENTAL OWNER PROPERTY_FOR_RENT

PROPERTY_OWNER

1NF

2NF

3NF

RENTALCLIENT

Normalization

38

General Definition of 2NF & 3NFGeneral Definition of 2NF & 3NF

Second normal form (2NF): A 1NF relation in which every non-primary-key attribute

is fully functionally dependent on the CK.

Third normal form (3NF): A 2NF relation in which NO non-primary-key attribute in a

nontrivial FD is transitively dependent on the CK.

Normalization

39

Boyce-Codd Normal Form Boyce-Codd Normal Form (BCNF)(BCNF)

Boyce-Codd normal form (BCNF): A 3NF relation in which every determinant in a

nontrivial FD is a CK.

Difference between 3NF & BCNF: A B

• 3NF allows A NOT CK.

• BCNF insists on A is a CK.

Potential to violate BCNF may occur in a relation that:

• Contain two (or more) composite CKs.

• CKs overlap. (at least one attribute in common).

Normalization

40

Boyce-Codd Normal Form Boyce-Codd Normal Form (BCNF)(BCNF)

A B C D

3NF but not BCNF

Normalization

41

ClientNo

CLIENT_INTERVIEW

Int_Date Int_Time StaffNo RoomNo

3NF BCNF3NF BCNF

ClientNo, Int_Date Int_Time, StaffNo, RoomNo

StaffNo, Int_Date, Int_Time ClientNo

RoomNo, Int_Date, Int_Time StaffNo, ClientNo

StaffNo, Int_Date RoomNo

1. Examine FDs for a relation.

2. If determinant is NOT a CK, decompose relation into 2 relations.

Normalization

42

3NF BCNF3NF BCNF

3. Remove non-CK dependencies by placing the functionally dependent attributes

in a new relation along with a copy of their determinants.

BCNF relation BCNF relation

Int_Date RoomNo

STAFF_ROOMClientNo Int_date Int_time StaffNo

INTERVIEW

StaffNo

Normalization

43

Review Example

PG4

PG16

Pno pAddress

18-Oct-00

22-Apr-01

1-Oct-01

22-Apr-01

24-Oct-01

iDate iTime

10:00

09:00

12:00

13:00

14:00

comments

Replace crockery

Good order

Damp rot

Replace carpet

Good condition

StaffNo

SG37

SG14

SG14

SG14

SG37

CarReg

M23JGR

M53HDR

N72HFR

M53HDR

N72HFR

Lawrence St,

Glasgow

5 Novar Dr.,

Glasgow

sName

Ann

David

David

David

Ann

STAFF_PROPERTY_INSPECTION

Unnormalized relation

Normalization

44

UNF 1NF

PG4

PG4

PG4

PG16

PG16

Pno pAddress

18-Oct-00

22-Apr-01

1-Oct-01

22-Apr-01

24-Oct-01

iDate iTime

10:00

09:00

12:00

13:00

14:00

comments

Replace crockery

Good order

Damp rot

Replace carpet

Good condition

StaffNo

SG37

SG14

SG14

SG14

SG37

CarReg

M23JGR

M53HDR

N72HFR

M53HDR

N72HFR

Lawrence St, Glasgow

Lawrence St,Glasgow

5 Novar Dr., Glasgow

5 Novar Dr., Glasgow

5 Novar Dr., Glasgow

sName

Ann

David

David

David

Ann

STAFF_PROPERTY_INSPECTION

1NF

Normalization

45

1NF 2NF

Pno pAddressiDate iTime comments StaffNo CarRegsName

STAFF_PROPERTY_INSPECTION

Pno, iDate iTime, comments, StaffNo, sName, carReg

Pno pAddress Partial Dependency

StaffNo Sname

iDate, StaffNo CarReg

iDate, iTime, CarReg Pno, pAddress, comments, StaffNo, Sname

iDate, iTime, StaffNo Pno, pAddress, Comments

46

1NF 2NF

Pno iDate iTime comments StaffNo CarRegsName

PROPERTY_INSPECTION

Pno, iDate iTime, comments, StaffNo, Sname, CarReg

StaffNo Sname Transitive Dependency

iDate, StaffNo CarReg

iDate, iTime, CarReg Pno, comments, StaffNo, Sname

iDate, iTime, StaffNo Pno, comments

2NF

Normalization

Pno pAddress

PROPERTY

2NF

Pno pAddress

47

2NF 3NF

Pno iDate iTime comments StaffNo CarReg

PROPERTY_INSPECTION

PROPERTY(Pno, pAddres)

STAFF(StaffNo, sName)

PROPERTY_INSPECT(Pno, iDate, iTime, comments, staffNo, CarReg)

3NF

Normalization

Pno pAddress

PROPERTY

3NF

StaffNo sName

STAFF

3NF

48

3NF BCNF

Pno iDate iTime comments StaffNo CarReg

PROPERTY_INSPECTION

Pno, iDate iTime, comments, staffNo, CarReg

StaffNo, iDate carReg

CarReg, iDate, iTime pno, comments, staffNo

StaffNo, iDate, iTime pno, comments

PROPERTY(Pno, pAddres)

STAFF(StaffNo, sName)

STAFF_CAR(StaffNo, iDate, CarReg)

PROPERTY_INSPECT(pno, iDate, iTime, comments, StaffNo)

3NF

Normalization

49

Multi-Valued Dependency (MVD)

Represents a dependency between attributes A, B, C in a relation, such that for

each value of A, there is a set of values for B and a set of values for C. However,

the set of values for B & C are independent of each others.

Denoted by: A B, A C

Example:

BranchNo SName, BranchNo OName

SName OName

BRANCH_STAFF_OWNER

BranchNo

B003B003B003B003

AnnDavidAnnDavid

CarolCarolTinaTina

Normalization

50

Trivial MVD

A B trivial MVD if:

B A

OR

A B = R

Normalization

51

Fourth Normal Form (4NF)

Fourth normal form (4NF): A BCNF relation with NO nontrivial MVD.

BCNF relation

SName OName

BRANCH_STAFF_OWNER

BranchNo

B003B003B003B003

AnnDavidAnnDavid

CarolCarolTinaTina

Normalization

52

BCNF 4NF

1. Start with a BCNF relation.2. Examine FDs for a relation.3. If nontrivial MVD exists, remove the MVD by placing the attributes in a new

relation along with a copy of their determinant.

4NF 4NF

SName

BRANCH_STAFF

BranchNo

B003B003

AnnDavid

OName

BRANCH_OWNER

BranchNo

B003B003

CarolTina

Normalization

53

Lossless-Join Dependency

A property of decompostion, which ensures that no spurious tuples are generated

when relations are reunited through a natural join operation.

Objectives:

Preserve all the data in the original relation

Does not result in the creation of additional spurious tuples

Normalization

54

Join Dependency

A, B, .., Z attributes in relation R satisfies join dependency if

Every legal value of R is equal to the join of its projections on A, B, .., Z

Normalization

55

Fifth Normal Form (5NF)

Fifth normal form (5NF): A 4NF relation with NO join dependency.

I_Description SupplierNo

PROPERTY_ITEM_SUPPLIER

PropertyNo

PG4PG4PG16

BedChairBed

S1S2S2

Normalization

Illegal State

56

4NF 5NF

I_Description

PROPERTY_ITEM

PropertyNo

PG4PG4PG16

BedChairBed

SupplierNo

ITEM_SUPPLIER

I_Description

BedChairBed

S1S2S2

SupplierNo

PROPERTY_ITEM

PropertyNo

PG4PG4PG16

S1S2S2

I_Description SupplierNo

PROPERTY_ITEM_SUPPLIER

PropertyNo

PG4PG4PG16PG4

BedChairBedBed

S1S2S2S2

Legal State

Given the following Dentist-patient database schema:

 

Dentist-patient (staffNo, dentistName, aDate, aTime, patNo, patName, surgeryNo) 

Normalize the above relation, showing appropriate dependency diagrams to justify decomposition.

QuestionQuestion

57Normalization