39
1 IS202 Data Management Chapter 4 Part 2: Normalization

chap4-2-stu

Embed Size (px)

DESCRIPTION

ppt

Citation preview

1

IS202 Data Management

Chapter 4

Part 2: Normalization

Motivation for Data Normalization

• Why an entity is mapped to a relation? Why a many-to-many relationship has to be mapped to a new relation? Why not use one relation to capture all the information?

EMPLOYEE

Emp_IDName

Dept_NameSalary

COURSE

Course_IDCourse_Title

Completes

Completed_Date

3

Motivation for Data Normalization

Question – Is this a relation?

Question – What’s the primary key?

4

Anomalies in this Table• Insertion – can’t enter a new employee without

having the employee take a class• Deletion – if we remove employee 140, we lose

information about the existence of a Tax Acc class

• Modification – giving a salary increase to employee 100 forces us to update multiple recordsWhy do these anomalies exist?

5

Well-Structured Relations

• A relation that contains minimal data redundancy and allows users to insert, delete, and update rows without causing data inconsistencies

• Goal is to avoid anomalies– Insertion Anomaly – adding new rows forces user to

create duplicate data– Deletion Anomaly – deleting rows may cause a loss

of data that would be needed for other future rows– Modification Anomaly – changing data in a row

forces changes to other rows because of duplication

General rule of thumb: a table should not pertain to more than one entity type

6

Data Normalization

• A tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data

• The problems of having duplication of data– Waste of space– Difficulty in consistency control– Any other?

• The process of decomposing relations with anomalies to produce smaller, well-structured relations

7

Data Normalization

• Data Normalization– Functional Dependency– 1st Normal Form (1NF)– 2nd Normal Form (2NF)– 3rd Normal Form (3NF)

8

Figure 4.22 -Steps in normalization

9

First Normal Form

• No multivalued attributes• Every attribute value is atomic

• All relations are in 1st Normal Form

10

Fig. 4-2a is not a relation (multivalued attributes).It is not in 1st Normal Form (1NF).

Figure 4-2 (a) Table with multivalues attributes

11

Figure 4-2b Eliminating multivalued attributes – EMPLOYEE2 relation

• Fig. 4-2b is in 1st Normal form

12

Second Normal Form

• 1NF plus every non-key attribute is fully functionally dependent on the ENTIRE primary key– Every non-key attribute must be defined by

the entire key, not by only part of the key

– No partial functional dependencies

13

• True of False (based on data in the table only)– The value of Name determines that of Nationality– The value of Name determines that of Email– The value of Email determines that of AveGPA – The value of Nationality determines that of Email– The value of ID determines that of Nationality

Name ID Email AveGPA Nationality

Helen 7812 [email protected] 3.87 Singapore

Tom 1983 [email protected] 2.70 Singapore

Mike 8572 [email protected] 3.02 China

Mike 9255 [email protected] 3.87 India

STUDENT

14

Functional Dependencies and Keys• Candidate Key:

– An attribute (or a combination of multiple attributes) that can uniquely identify a row.

– One of the candidate keys is chosen as identifier (in ER stage), which becomes the Primary Key (in LD stage)

• E.g. for a student, both email address and student id are unique. However, only one of them becomes the identifier (and subsequently the primary key)

– Non-key attributes: attributes that are not candidate keys

• Functional Dependency: The value of one attribute, or of combination of attributes, (the determinant) determines the value of another attribute– Each non-key field is functionally dependent on every

candidate key

15

Name ID Email AveGPA Nationality

• True or False (assume each student has a unique email address): – Name is a non-key attribute: – Email is a candidate key: – Name is a candidate key:– Name depends on ID: – Name depends on Email :– Email depends on AveGPA :– Nationality depends on Email :– AveGPA depends on Name :

Student

16

Representing Functional Dependencies

Name ID Email AveGPA Nationality

• Graphical representation: a link is issued from the determinant pointing to the other attribute– For example: ID is the determinant, and Name is

functionally dependent on ID

• Text representation:– ID -> Name, AveGPA, Nationality– Email -> Name, AveGPA, Nationality

Name ID

Student

17

Partial Functional Dependency

• A function dependency in which one or more non-key attributes are functionally dependent on part (but not in all) of the primary key is defined as a partial functional dependency.

18

Fig 4.23(b) – Functional Dependencies in EMPLOYEE2

Dependency on entire primary key

Dependencies on part of the key (partial functional dependencies)

EmpID Name, DeptName, Salary

Partial dependencies => NOT in 2nd Normal Form!!

EmpID, CourseID DateCompleted

CourseID Course_Title

19

Getting it into 2nd Normal Form

• Decompose the relation into three separate relations

EmpID SalaryDeptNameName

CourseID DateCompletedEmpID

No partial functional dependencies

Emp

Emp_Course

CourseTitleCourseIDCourse

20

Third Normal Form

• 2NF PLUS no transitive dependencies

• A transitive dependency is a functional dependency between two (or more) non-key attributes.

21

Example -- Relation with transitive dependency

(a) Customer_Order Relation with simple data

Order_ID Order_Date Customer_ID Customer_Name Customer_Address

001 09/30/2004 C1 Jack 15145 S.W. 17th St

002

003

004

005

09/30/2004

04/04/2005

07/21/2005

08/13/2005

C1

C2

C1

C2

Jack

Jack

Mary

Mary

15145 S.W. 17th St

15145 S.W. 17th St

1900 Allard Ave

1900 Allard Ave

22

Example -- Relation with transitive dependency

Order_ID Order_DateOrder_ID Customer_IDOrder_ID Customer_NameOrder_ID Customer_Address

All this is OK(2nd NF)

BUT

Customer_ID Customer_NameCustomer_ID Customer_Address

Transitive Dependency: Not 3rd NF

Order_ID Order_Date Customer_ID Customer_Name Customer_Address

CUSTOMER_ORDER

23

Now, there are no transitive dependencies…Both relations are in 3rd NF

Order_ID Customer_ID

Customer_ID Customer_Name, Customer_Address

24

Another ExampleFig 4-26 Invoice relation (1NF) (Pine Valley Furniture Company)

25

OrderID OrderDate, CustomerID, CustomerName, CustomerAddress

Therefore, NOT in 2nd Normal Form

ProductID ProductDescription, ProductFinish, UnitPrice

OrderID, ProductID OrderQuantity

OrderID OrderDateCustomer

IDCustomer

NameCustomerAddress

ProductIDProduct

DescriptionProductFinish

UnitPriceOrderedQuantity

2NF: Remove Partial Functional Dependency

26

Getting it into the Second Normal Form

No partial functional dependency, and all threerelations are in 2NF

Fig 4-28 Removing Partial Dependencies

3NF: Remove Transitive Dependency

Therefore, CUSTOMER_ORDER relation isNOT in 3rd Normal Form

CustomerID CustomerName, CustomerAddress

28

Transitive dependencies are removed.

Getting it into the Third Normal Form

Fig 4-29 Removing Transitive Dependencies

29

Product_ID

Product_Description Product_Finish Standard_Price Product_Line_Id

1 End Table Cherry $175.00 1

2 Coffer Table Natural Ash $200.00 2

3 Computer Desk Natural Ash $175.00 2

4 Entertainment Center Walnut $650.00 3

5 Writers Desk Cherry $325.00 1

6 8-Drawer Desk White Ash $750.00 2

7 Dining Table Natural Ash $800.00 2

8 Computer Desk Walnut $500.00 3

Find all the function dependencies.

Is this in 2NF? If not, get it into 2NF.

Is this in 3NF? If not, get it into 3NF.

In Class Exercise 5.7

PRODUCT

30

Data Normalization Summary

• 1st Normal Form– no multivalued attributes, and every attribute value is

atomic– All relations are in 1st Normal Form

• 2nd Normal Form– 1NF + every non-key attribute is fully functionally

dependent on the ENTIRE primary key– Decomposing the relation into two new relations

• 3rd Normal Form– 2NF PLUS no transitive dependencies– Decomposing the relation into two new relations

31

Other Normal Forms (from Appendix B)

• Boyce-Codd NF– All determinants are candidate keys…there is no

determinant that is not a unique identifier

• 4th NF– No multivalued dependencies

• 5th NF – No “lossless joins”

• Domain-key NF– The “ultimate” NF…perfect elimination of all

possible anomalies

32

In Class Exercise 5.8

Transitive Dependency Removal

INVOICE

33

In Class Exercise 5.9• The structure and sample data are provided for following table. Break it into

relations in 3NF (assumption: Dept_Manager must be a Emp who has a unique Emp_Code, and Emp_Educ is a multi-valued attribute).

Emp_Code Emp_Name Emp_Educ Dept_Code Dept_Name Dept_Manager

Job_Class Job_Title Emp_DoB Emp_Hire_Date Job_Base_Salary

1003 Willaker BS, MBA MKTG Marketing 1012

23 Sales Agent 10/23/1968 10/14/1997 $32,255

Continued...

Transitive Dependency

Transitive Dependency

Employee

34

In Class Exercise 5.10

• Consider the following relation definition and sample data:PROJECT (ProjectID, EmployeeName, EmployeeSalary)– ProjectID: name of a

work project– EmployeeName: name

of an employee– EmployeeSalary: salary

of the employee

ProjectID EmployeeName

EmployeeSalary

100A Jones 64K

100A Smith 51K

100B Smith 51K

200A Jones 64K

200B Jones 64K

200C Parks 28K

200C Smith 51K

200D Parks 28K

PROJECT

35

In Class Exercise (cont’d)

• Assuming that all of the functional dependencies and constraints are apparent in this data, which of the following statements is true?– ProjectID EmployeeName– ProjectID EmployeeSalary– (ProjectID, EmployeeName)

EmployeeSalary– EmployeeName EmployeeSalary– EmployeeSalary ProjectID– EmployeeSalary (ProjectID,

EmployeeName)

36

In Class Exercise (cont’d)

• What is the key of PROJECT?• Are all non-key attributes (if any)

dependent on all of the key?• In what normal form is PROJECT?• Describe one modification anomaly from

which PROJECT suffers.• Is ProjectID a determinant?• Is EmployeeName a determinant?

37

In Class Exercise (cont’d)

• Is (ProjectID, EmployeeName) a determinant?

• Is EmployeeSalary a determinant?

• Does this relation contain a partial dependency? If so, what is it?

• Redesign this relation to eliminate the modification anomalies.

38

What we have learned

• Key concepts:– Functional dependency– Partial functional dependency– Transitive dependency

• Data normalization– 1NF: must be a relation– 2NF: 1NF + no partial functional dependency– 3NF: 2NF + no transitive dependency

39

What you need to do

• Review Chapter 4 (part 2)

Concepts Recommendation

Introduction to Normalization √

Normalization Example: Pine Valley Furniture Company ☺A Final Step for Defining Relational Keys ☺

√ : must read, ☺: good for you to read