40
Lecture9:Functional Dependencies and Normalization for Relational Databases Prepared by L. Nouf Almujally Ref. Chapter14 - 15 Lecture9 1

Lecture9: Functional Dependencies and Normalization for Relational Databases

Embed Size (px)

DESCRIPTION

Lecture9: Functional Dependencies and Normalization for Relational Databases. Ref. Chapter14 - 15. Prepared by L. Nouf Almujally. How to produce a good relation schema?. STEPS : Start with a set of relation. Define the functional dependencies for the relation to specify the PK. - PowerPoint PPT Presentation

Citation preview

Lec

ture

9

Lecture9:Functional Dependencies and Normalization for Relational Databases

Prepared by L. Nouf Almujally

Ref. Chapter14 - 15

1

Lec

ture

9

How to produce a good relation schema?

STEPS:1. Start with a set of relation.2. Define the functional dependencies for the relation to

specify the PK.3. Transform relations to normal form.

2

Lec

ture

9

Functional Dependencies

• Describes the relationship between attributes in a relation.

• If A and B are attributes of relation R, B is functionally dependent on A, denoted by A B, if each value of A is associated with exactly one value of B. B may have several values of A.

Determinant Dependent

3

A BB is functionallydependent on A

Nor

mal

izat

ion

Lec

ture

9

Functional Dependencies

X Y• X -> Y holds if whenever two tuples have the same value for X,

they must have the same value for Y• For any two tuples t and u in any relation instance r(R): If

t[X]=u[X], then t[Y]=u[Y]

4

X Y

t

u

If t & u agree here Then they must agree here

Nor

mal

izat

ion

Lec

ture

9

Functional Dependencies

5

Example

StaffNo positionPosition is functionallydependent on Staffno

position StaffNoStaffNo is NOT functionally

dependent on position

SL21 Manager

Manager SL21 SG5

1:1 or M:1 relationship

between attributes in a

relation

1:M relationship

between attributes in a

relation

Nor

mal

izat

ion

Lec

ture

9

Examples of FD constraints

• Social security number determines employee name• SSN -> ENAME

• Project number determines project name and location• PNUMBER -> {PNAME, PLOCATION}

• Employee ssn and project number determines the hours per week that the employee works on the project• {SSN, PNUMBER} -> HOURS

6

Nor

mal

izat

ion

Lec

ture

9

Identifying the PK

• Purpose of functional dependency, specify the set of integrity constraints that must hold on a relation.

• The determinant attribute(s) are candidate of the relation, if:• 1:1 relationship between determinant & dependent.• No subset of determinant attribute(s) is a determinant.

(nontrivial)

If (A, B) C, then NOT A B, and NOT B A

• All attributes that are not part of the CK should be functionally dependent on the key: CK all attributes of R

• Hold for all time.• PK is the candidate attribute(s) with the minimal set of functional

dependency.

Nor

mal

izat

ion

7

Lec

ture

9

Identifying the PK

• If a relation schema has more than one key, each is called a candidate key.• One of the candidate keys is arbitrarily designated to be the

primary key, and the others are called secondary keys.

• A Prime attribute must be a member of some candidate key

• A Nonprime attribute is not a prime attribute—that is, it is not a member of any candidate key.

8

Nor

mal

izat

ion

Lec

ture

9

The Purpose of Normalization

• Normalization is a bottom-up approach to database design that begins by examining the relationships between attributes. It is performed as a series of tests on a relation to determine whether it satisfies or violates the requirements of a given normal form.

• Purpose:- Guarantees no redundancy due to FDs- Guarantees no update anomalies

• Normal Forms:• First Normal Form (1NF)• Second Normal Form (2NF)• Third Normal Form (3NF)• Boyce-Codd Normal Form (BCNF) 9

Nor

mal

izat

ion

Lec

ture

9

Normal Forms Defined Informally

• 1st normal form• All attributes depend on the key

• 2nd normal form• All attributes depend on the whole key

• 3rd normal form• All attributes depend on nothing but the key

10

Nor

mal

izat

ion

Lec

ture

9

First Normal Form (1NF)

11

Unnormalized form (UNF): A relation that contains one or more

repeating groups.

First normal form (1NF): A relation in which the intersection of each row

and column contains one & only one value.

1NF Disallows:• composite attributes• multivalued attributes• nested relations; attributes whose values for an individual tuple

are non-atomic

Nor

mal

izat

ion

Lec

ture

9

First Normal Form (1NF)

12

ClientNo

CR76

PropertyNo

PG4

Name

John Key

CLIENT_PROPERTY

PG16

PG4PG36

PG16

CR56 Aline Stewart

Unnormalized form (UNF)

Not in the 1NF because there are Multivalued attribute in the table (PropertyNo)

Nor

mal

izat

ion

Lec

ture

9

UNF 1NF Approach 1

• Expand the key so that there will be a separate tuple in the original relation for each repeated attribute(s).

• Primary key becomes the combination of primary key and redundant value (multivalued attribute).

1NF relation

• Disadvantage: introduce redundancy in the relation.13

ClientNo

CR76

PropertyNo

PG4

Name

John Key

CLIENT_PROPERTY

PG16

PG4PG36

PG16

CR56 Aline Stewart

CR76 John Key

CR56 Aline Stewart

CR56 Aline Stewart

Nor

mal

izat

ion

Lec

ture

9

UNF 1NF Approach 2

• If the maximum number of values is known for the attribute, replace repeated attribute (PropertyNo) with a number of atomic attributes (PropertyNo1, PropertyNo2, PropertyNo3).

1NF relation

• Disadvantage: introduce NULL values in the relation.14

ClientNo

CR76

PropertyNo1

PG4

Name

John Key

CLIENT_PROPERTY

PG16

PG4 PG36CR56 Aline Stewart

PropertyNo2 PropertyNo3

NULL

PG16

Nor

mal

izat

ion

Lec

ture

9

Summary : first normal form

• 1NF : if all attribute values are atomic: no repeating group, no composite attributes.

15

Nor

mal

izat

ion

Lec

ture

9

UNF (multivalued) 1NF

16

Nor

mal

izat

ion

Lec

ture

9

UNF (nested relations) 1NF

17

Nor

mal

izat

ion

Example : First normal form -1NF

The following table is not in 1NF because there are nested relations in the table

DPT_NO MG_NO EMP_NO EMP_NM

D101 12345 200002000120002

Carl SaganMag JamesLarry Bird

D102 13456 3000030001

Jim CarterPaul Simon

18

Lecture9Normalization

Table in 1NF

• all attribute values are atomic because there are no repeating group and no composite attributes.

DPT_NO MG_NO EMP_NO EMP_NM

D101 12345 20000 Carl Sagan

D101 12345 20001 Mag James

D101 12345 20002 Larry Bird

D102 13456 30000 Jim Carter

D102 1345630001

Paul Simon

19

Normalization Lecture9

Lec

ture

9

Second Normal Form

• Uses the concepts of FDs, primary key• Definitions

• Prime attribute: An attribute that is member of the primary key K

• Full functional dependency: a FD Y -> Z where removal of any attribute from Y means the FD does not hold any more

• Examples:• {SSN, PNUMBER} -> HOURS is a full FD since neither SSN ->

HOURS nor PNUMBER -> HOURS hold • {SSN, PNUMBER} -> ENAME is not a full FD (it is called a

partial dependency ) since SSN -> ENAME also holds 20

Nor

mal

izat

ion

Lec

ture

9

Second Normal Form

• Second normal form (2NF) further addresses the concept of removing duplicative data

• A relation R is in 2NF if

1. R is 1NF , and 2. All non-prime attributes are fully dependent on the candidate

keys. Which is creating relationships between these new tables and their predecessors through the use of foreign keys.

• A prime attribute appears in a candidate key.• There is no partial dependency in 2NF.

21

Nor

mal

izat

ion

Summary : Second Normal Form (2NF)

1) Meet all the requirements of the 1NF2) Remove columns that are not fully dependent upon the

primary key.

22

Le

ctu

re9

Normalization

Lec

ture

9

Example1: 1NF 2NF

23

Remove partial dependencies by placing the functionally dependent

attributes in a new relation along with a copy of their determinants.

Nor

mal

izat

ion

Lec

ture

9

Example2: Second normal form -2NF

24

Inventory

Description Supplier Cost Supplier Address

Inventory

Description Supplier Cost

There are two non-key fields. So, here are the questions:

• If I know just Description, can I find out Cost? No, because we have more than one supplier for the same product.

• If I know just Supplier, and I find out Cost? No, because I need to know what the Item is as well.

Therefore, Cost is fully, functionally dependent upon the ENTIRE PK (Description-Supplier) for its existence.

Nor

mal

izat

ion

Lec

ture

9

Example 2: Second normal form -2NF

25Supplier

Name Supplier Address

Inventory

Description Supplier Cost Supplier Address

• If I know just Description, can I find out Supplier Address? No, because we have more than one supplier for the same product.

• If I know just Supplier, and I find out Supplier Address? Yes. The Address does not depend upon the description of the item.

Therefore, Supplier Address is NOT functionally dependent upon the ENTIRE PK (Description-Supplier) for its existence.

Nor

mal

izat

ion

Inventory

Description Supplier Cost

Supplier

Name Supplier Address

The above relations are now in 2NF

26

Le

ctu

re9

Example 2: Second normal form -2NF

Normalization

Lec

ture

9

Third Normal Form (1)

• Transitive functional dependency

X, Y, Z are attributes of a relation, such that:

• If X Y and Y Z, then Z is transitively dependent on X via Y.

• Provided X is NOT functionally dependent on Y or Z (nontrivial FD).

• Examples:• SSN -> DMGRSSN is a transitive FD

• Since SSN -> DNUMBER and DNUMBER -> DMGRSSN hold

• SSN -> ENAME is non-transitive• Since there is no set of attributes X where SSN -> X and X -> ENAME

27

Nor

mal

izat

ion

Lec

ture

9

Third Normal Form (2)

• A relation schema R is in third normal form (3NF) if :1. R in 2NF and2. no non-prime attribute A in R is transitively dependent on the

primary key

• R can be decomposed into 3NF relations via the process of 3NF normalization

• NOTE:• In X -> Y and Y -> Z, with X as the primary key, we consider this a

problem only if Y is not a candidate key.• When Y is a candidate key, there is no problem with the

transitive dependency .• E.g., Consider EMP (SSN, Emp#, Salary ).

• Here, SSN -> Emp# -> Salary and Emp# is a candidate key.

28

Nor

mal

izat

ion

Summary : Third Normal Form (3NF)

1) Meet all the requirements of the 1NF2) Meet all the requirements of the 2NF3) Remove columns that are not dependent upon the primary

key.

29

Le

ctu

re9

Normalization

Lec

ture

9

Example: 2NF 3NF

30

If transitive dependencies exist, place transitively dependent attributes

in a new relation along with a copy of their determinants.

Nor

mal

izat

ion

Lec

ture

9

• describes parcels of land for sale in various counties of a state. Suppose that there are two candidate keys: Property_id# and {County_name, Lot#}• lot # are unique only within each county• Property_id# numbers are unique across counties for the entire

state.

31

Example : Third normal form -3NF

Nor

mal

izat

ion

Lec

ture

9

32

Example: 2NF 3NF

Nor

mal

izat

ion

Lec

ture

9

Books

Name Author's Name Author's Non-de Plume # of Pages

Books

Name Author's Name # of Pages

• If I know # of Pages, can I find out Author's Name? No. Can I find out Author's Non-de Plume? No.

• If I know Author's Name, can I find out # of Pages? No. Can I find out Author's Non-de Plume? YES.

Therefore, Author's Non-de Plume is functionally dependent upon Author's Name, not the PK for its existence.

Author

Name Non-de Plume

33

Example : Third normal form -3NF

Nor

mal

izat

ion

Review Example

34

PG4

PG16

Pno pAddress

18-Oct-00

22-Apr-01

1-Oct-01

22-Apr-01

24-Oct-01

iDate iTime

10:00

09:00

12:00

13:00

14:00

comments

Replace crockery

Good order

Damp rot

Replace carpet

Good condition

StaffNo

SG37

SG14

SG14

SG14

SG37

CarReg

M23JGR

M53HDR

N72HFR

M53HDR

N72HFR

Lawrence St,

Glasgow

5 Novar Dr.,

Glasgow

sName

Ann

David

David

David

Ann

STAFF_PROPERTY_INSPECTION

Unnormalized relation

Nor

mal

izat

ion

UNF 1NF

35

PG4

PG4

PG4

PG16

PG16

Pno pAddress

18-Oct-00

22-Apr-01

1-Oct-01

22-Apr-01

24-Oct-01

iDate iTime

10:00

09:00

12:00

13:00

14:00

comments

Replace crockery

Good order

Damp rot

Replace carpet

Good condition

StaffNo

SG37

SG14

SG14

SG14

SG37

CarReg

M23JGR

M53HDR

N72HFR

M53HDR

N72HFR

Lawrence St, Glasgow

Lawrence St,Glasgow

5 Novar Dr., Glasgow

5 Novar Dr., Glasgow

5 Novar Dr., Glasgow

sName

Ann

David

David

David

Ann

STAFF_PROPERTY_INSPECTION

1NF

Nor

mal

izat

ion

1NF 2NF

36

Pno pAddressiDate iTime comments StaffNo CarRegsName

STAFF_PROPERTY_INSPECTION

Partial Dependency : Pno pAddress

Nor

mal

izat

ion

1NF 2NF

37

Pno iDate iTime comments StaffNo CarRegsName

PROPERTY_INSPECTION

2NF

Pno pAddress

PROPERTY

2NF Pno pAddress

Transitive Dependency : StaffNo Sname

Nor

mal

izat

ion

2NF 3NF

38

Pno iDate iTime comments StaffNo CarReg

PROPERTY_INSPECTION

PROPERTY(Pno, pAddres)

STAFF(StaffNo, sName)

PROPERTY_INSPECT(Pno, iDate, iTime, comments, staffNo, CarReg)

3NF

Pno pAddress

PROPERTY

3NF

StaffNo sName

STAFF

3NF

Nor

mal

izat

ion

Lec

ture

9

39

Nor

mal

izat

ion

Lec

ture

9

References

• “Database Systems: A Practical Approach to Design, Implementation and Management.” Thomas Connolly, Carolyn Begg. 5th Edition, Addison-Wesley, 2009.

40

Nor

mal

izat

ion