27
Database Design: Normalization Reading: C&B, Chaps 14

Database Design: Normalization Reading: C&B, Chaps 14

Embed Size (px)

Citation preview

Page 1: Database Design: Normalization Reading: C&B, Chaps 14

Database Design: Normalization

Reading: C&B, Chaps 14

Page 2: Database Design: Normalization Reading: C&B, Chaps 14

Dept. of Computer Science, University of Aberdeen 2

In this lecture you will learn

• Mathematical notions behind relational model

• Normalization

Page 3: Database Design: Normalization Reading: C&B, Chaps 14

Dept. of Computer Science, University of Aberdeen 3

Introduction

• Relations derived from ER model may be ‘faulty’– May cause data redundancy, and

insert/delete/update anomalies

• We use some mathematical (semantic?) properties of relations to– locate these faults and– fix them

Page 4: Database Design: Normalization Reading: C&B, Chaps 14

Dept. of Computer Science, University of Aberdeen 4

Mathematical notions behind relational model

• Set – a collection of objects characterized by some defining property– E.g. a column in a database table such as last names of all

staff• Cross Product of sets – one of the operations (X) on sets

– E.g. consider two sets, set of all first names and set of all last names in the staff table

– fName = {Mary, David}– lName = {Howe, Ford}– fNameXlName = {(Mary,Howe), (Mary,Ford), (David, Howe),

(David, Ford)}• Relation – defined between two sets and is a subset of cross

product between those two sets– E.g. FirstNameOf = {(Mary, Howe), (David, Ford)}

Page 5: Database Design: Normalization Reading: C&B, Chaps 14

Dept. of Computer Science, University of Aberdeen 5

Relational model

• The name ‘relational model’ comes from this mathematical notion of relation– Where a relation is a set (collection) of tuples

that have related objects such as first name and last name of the same person

– E.g. (fName, lName) is a relation• We can have relations over any number of sets

– E.g. (staffNo, fName, lName, position)• In general we can denote a relation as (A,B,C,D,

….,Z) where A, B, C and Z are all its attribute sets

Page 6: Database Design: Normalization Reading: C&B, Chaps 14

Dept. of Computer Science, University of Aberdeen 6

Function

• A function is a special kind of relation• In a relation (X,Y), if every value of X

is associated with exactly one value of Y, then we say Y is a function of X.– E.g. the relation {(1,2),(2,4),(3,6),(4,8)}

is a function, Y = 2*X for 0<X<5

1234

2468

Only one arrow can start from any single value in X

X Y

Page 7: Database Design: Normalization Reading: C&B, Chaps 14

Dept. of Computer Science, University of Aberdeen 7

Functional Dependency

• If Y is a function of X– Y is dependent on X, – there is a relationship of functional dependency between Y

and X• In databases, we work with relations in general

form (A,B,C,D,……,Z)• Functional Dependency

– Describes relationship between attributes in a relation.

– If A and B are attributes of relation R, B is functionally dependent on A, if each value of A in R is associated with exactly one value of B in R.

• We are interested in finding such functional dependencies among database relations

Page 8: Database Design: Normalization Reading: C&B, Chaps 14

Dept. of Computer Science, University of Aberdeen 8

Functional Dependency

• Is a property of the meaning (or semantics) of the attributes in a relation.

• Diagrammatic representation:

• Determinant of a functional dependency refers to attribute or group of attributes on left-hand side of the arrow.

• If the determinant can maintain the functional dependency with a minimum number of attributes, then we call it full functional dependency

Page 9: Database Design: Normalization Reading: C&B, Chaps 14

Dept. of Computer Science, University of Aberdeen 9

Data Redundancy

• Major aim of relational database design is – to group attributes into relations to minimize

data redundancy and – to reduce file storage space required by base

relations.

• Data redundancy is undesirable because of the following anomalies– ‘Insert’ anomalies– ‘Delete’ anomalies– ‘Update’ anomalies

• We illustrate these anomalies with an example

Page 10: Database Design: Normalization Reading: C&B, Chaps 14

Dept. of Computer Science, University of Aberdeen 10

Data Redundancy

Page 11: Database Design: Normalization Reading: C&B, Chaps 14

Dept. of Computer Science, University of Aberdeen 11

Anomalies

• Insert anomalies– Try to insert details for a new member of staff into

StaffBranch– You also need to insert branch details that are consistent

with existing details for the same branch– Hard to maintain data consistency with StaffBranch

• Delete anomalies– Try to delete details for a member of staff from

StaffBranch– You also loose branch details in that tuple (row)

• Update anomalies– Try to update the value of one of the attributes of a

branch– You also need to update that information in all the tuples

about the same branch

Page 12: Database Design: Normalization Reading: C&B, Chaps 14

Dept. of Computer Science, University of Aberdeen 12

Decomposition of Relations

• Staff and Branch relations which are obtained by decomposing StaffBranch do not suffer from these anomalies

• Two important properties of decomposition– Lossless-join property enables us to find any

instance of original relation from corresponding instances in the smaller relations.

– Dependency preservation property enables us to enforce a constraint on original relation by enforcing some constraint on each of the smaller relations.

Page 13: Database Design: Normalization Reading: C&B, Chaps 14

Dept. of Computer Science, University of Aberdeen 13

The Process of Normalization

• Formal technique for analyzing a relation based on its primary key and functional dependencies between its attributes.

• Often executed as a series of steps. Each step corresponds to a specific normal form, which has known properties.

• As normalization proceeds, relations become progressively more restricted (stronger) in format and also less vulnerable to update anomalies.

• Given a relation, use the following cycle– Find out what normal form it is in– Transform the relation to the next higher form by decomposing

it to form simpler relations– You may need to refine the relation further if decomposition

resulted in undesirable properties

Page 14: Database Design: Normalization Reading: C&B, Chaps 14

Dept. of Computer Science, University of Aberdeen 14

Unnormalized Form (UNF)

• A table that contains one or more repeating groups.

• To create an unnormalized table: – transform data from information source (e.g. form) into

table format with columns and rows.

Name Address Phone

Sally Singer 123 Broadway New York, NY, 11234 (111) 222-3345

Jason Jumper 456 Jolly Jumper St. Trenton NJ, 11547 (222) 334-5566

Example 1 – address and name fields are composite

Page 15: Database Design: Normalization Reading: C&B, Chaps 14

Dept. of Computer Science, University of Aberdeen 15

Another example of UNF

Rep ID Representative Client 1 Time 1 Client 2 Time 2 Client 3 Time 3

TS-89 Gilroy Gladstone US Corp. 14 hrs Taggarts 26 hrs Kilroy Inc. 9 hrs

RK-56 Mary Mayhem Italiana 67 hrs Linkers 2 hrs    

Example 2 – repeating columns for each client & composite name field

Page 16: Database Design: Normalization Reading: C&B, Chaps 14

Dept. of Computer Science, University of Aberdeen 16

First Normal Form (1NF)

• A relation in which intersection of each row and column contains one and only one value.

• UNF to 1NF– Nominate an attribute or group of

attributes to act as the key for the unnormalized table.

– Identify repeating group(s) in unnormalized table which repeats for the key attribute(s).

Page 17: Database Design: Normalization Reading: C&B, Chaps 14

Dept. of Computer Science, University of Aberdeen 17

UNF to 1NF

• Remove repeating group by:– entering appropriate data into the

empty columns of rows containing repeating data (‘flattening’ the table).

Or by– placing repeating data along with copy

of the original key attribute(s) into a separate relation.

Page 18: Database Design: Normalization Reading: C&B, Chaps 14

Dept. of Computer Science, University of Aberdeen 18

Example 1

ID First Last Street City State Zip Phone

564 Sally Singer 123 Broadway New York NY 11234 (111) 222-3345

565 Jason Jumper 456 Jolly Jumper St. Trenton NJ 11547 (222) 334-5566

•Address field has been expressed in terms of constituent parts, such as street, city and postcodeName field has been expressed in terms of last name and first name

Page 19: Database Design: Normalization Reading: C&B, Chaps 14

Dept. of Computer Science, University of Aberdeen 19

Example 2

Rep ID Rep First Name Rep Last Name Client Time With Client

TS-89 Gilroy Gladstone US Corp 14 hrs

TS-89 Gilroy Gladstone Taggarts 26 hrs

TS-89 Gilroy Gladstone Kilroy Inc. 9 hrs

RK-56 Mary Mayhem Italiana 67 hrs

RK-56 Mary Mayhem Linkers 2 hrs

•Table structure has been changed •Data related to representative repeated•Representative name expressed in terms of last name and first name

Page 20: Database Design: Normalization Reading: C&B, Chaps 14

Dept. of Computer Science, University of Aberdeen 20

Example 2

Rep ID* Rep First Name Rep Last Name Client ID* Client Time With Client

TS-89 Gilroy Gladstone 978 US Corp 14 hrs

TS-89 Gilroy Gladstone 665 Taggarts 26 hrs

TS-89 Gilroy Gladstone 782 Kilroy Inc. 9 hrs

RK-56 Mary Mayhem 221 Italiana 67 hrs

RK-56 Mary Mayhem 982 Linkers 2 hrs

•A new field ClientID introduced •RepId and ClientID combination acts as the primary key

Page 21: Database Design: Normalization Reading: C&B, Chaps 14

Dept. of Computer Science, University of Aberdeen 21

Second Normal Form (2NF)

• Based on concept of full functional dependency:– A and B are attributes of a relation R, – B is fully dependent on A (denoted A->B) if B is

functionally dependent on A but not on any proper subset of A.

• 2NF - A relation that is in 1NF and every non-primary-key attribute is fully functionally dependent on the primary key.

Page 22: Database Design: Normalization Reading: C&B, Chaps 14

Dept. of Computer Science, University of Aberdeen 22

1NF to 2NF

• Identify primary key for the 1NF relation.

• Identify functional dependencies in the relation.

• If partial dependencies exist on the primary key remove them by placing them in a new relation along with copy of their determinant.

Page 23: Database Design: Normalization Reading: C&B, Chaps 14

Dept. of Computer Science, University of Aberdeen 23

Example 2NF

Rep ID* Client ID* Time With Client

TS-89 978 14 hrs

TS-89 665 26 hrs

TS-89 782 9 hrs

RK-56 221 67 hrs

RK-56 982 2 hrs

RK-56 665 4 hrs

Rep ID* First Name Last Name

TS-89 Gilroy Gladstone

RK-56 Mary Mayhem

Client ID* Client Name

978 US Corp

665 Taggarts

782 Kilroy Inc.

221 Italiana

982 Linkers

•Original table decomposed into smaller tables

•Each of them are in 2NF

Page 24: Database Design: Normalization Reading: C&B, Chaps 14

Dept. of Computer Science, University of Aberdeen 24

Third Normal Form (3NF)

• Based on concept of transitive dependency:– A, B and C are attributes of a relation such that

if A -> B and B -> C, – then C is transitively dependent on A through

B. (Provided that A is not functionally dependent on B or C).

• 3NF - A relation that is in 1NF and 2NF and in which no non-primary-key attribute is transitively dependent on the primary key.

Page 25: Database Design: Normalization Reading: C&B, Chaps 14

Dept. of Computer Science, University of Aberdeen 25

2NF to 3NF

• Identify the primary key in the 2NF relation.

• Identify functional dependencies in the relation.

• If transitive dependencies exist on the primary key remove them by placing them in a new relation along with copy of their determinant.

Page 26: Database Design: Normalization Reading: C&B, Chaps 14

Dept. of Computer Science, University of Aberdeen 26

Normalization Flow

UNF

1NF

2NF

3NF

Remove repeating groups

Remove partial dependencies

Remove transitive dependencies

More normalized forms

Page 27: Database Design: Normalization Reading: C&B, Chaps 14

Dept. of Computer Science, University of Aberdeen 27

Conclusion

• Quality of the relations derived from ER models is unknown

• Normalization is a systematic process of either assessing or converting these relations into progressively stricter normal forms

• Advanced normal forms such as Boyce-Codd normal form (BNCF), 4NF and 5NF exist