Upload
stephen-ngo
View
215
Download
3
Tags:
Embed Size (px)
DESCRIPTION
ppt
Citation preview
Motivation for Data Normalization
• Why an entity is mapped to a relation? Why a many-to-many relationship has to be mapped to a new relation? Why not use one relation to capture all the information?
EMPLOYEE
Emp_IDName
Dept_NameSalary
COURSE
Course_IDCourse_Title
Completes
Completed_Date
3
Motivation for Data Normalization
Question – Is this a relation?
Question – What’s the primary key?
4
Anomalies in this Table• Insertion – can’t enter a new employee without
having the employee take a class• Deletion – if we remove employee 140, we lose
information about the existence of a Tax Acc class
• Modification – giving a salary increase to employee 100 forces us to update multiple recordsWhy do these anomalies exist?
5
Well-Structured Relations
• A relation that contains minimal data redundancy and allows users to insert, delete, and update rows without causing data inconsistencies
• Goal is to avoid anomalies– Insertion Anomaly – adding new rows forces user to
create duplicate data– Deletion Anomaly – deleting rows may cause a loss
of data that would be needed for other future rows– Modification Anomaly – changing data in a row
forces changes to other rows because of duplication
General rule of thumb: a table should not pertain to more than one entity type
6
Data Normalization
• A tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data
• The problems of having duplication of data– Waste of space– Difficulty in consistency control– Any other?
• The process of decomposing relations with anomalies to produce smaller, well-structured relations
7
Data Normalization
• Data Normalization– Functional Dependency– 1st Normal Form (1NF)– 2nd Normal Form (2NF)– 3rd Normal Form (3NF)
9
First Normal Form
• No multivalued attributes• Every attribute value is atomic
• All relations are in 1st Normal Form
10
Fig. 4-2a is not a relation (multivalued attributes).It is not in 1st Normal Form (1NF).
Figure 4-2 (a) Table with multivalues attributes
11
Figure 4-2b Eliminating multivalued attributes – EMPLOYEE2 relation
• Fig. 4-2b is in 1st Normal form
12
Second Normal Form
• 1NF plus every non-key attribute is fully functionally dependent on the ENTIRE primary key– Every non-key attribute must be defined by
the entire key, not by only part of the key
– No partial functional dependencies
13
• True of False (based on data in the table only)– The value of Name determines that of Nationality– The value of Name determines that of Email– The value of Email determines that of AveGPA – The value of Nationality determines that of Email– The value of ID determines that of Nationality
Name ID Email AveGPA Nationality
Helen 7812 [email protected] 3.87 Singapore
Tom 1983 [email protected] 2.70 Singapore
Mike 8572 [email protected] 3.02 China
Mike 9255 [email protected] 3.87 India
STUDENT
14
Functional Dependencies and Keys• Candidate Key:
– An attribute (or a combination of multiple attributes) that can uniquely identify a row.
– One of the candidate keys is chosen as identifier (in ER stage), which becomes the Primary Key (in LD stage)
• E.g. for a student, both email address and student id are unique. However, only one of them becomes the identifier (and subsequently the primary key)
– Non-key attributes: attributes that are not candidate keys
• Functional Dependency: The value of one attribute, or of combination of attributes, (the determinant) determines the value of another attribute– Each non-key field is functionally dependent on every
candidate key
15
Name ID Email AveGPA Nationality
• True or False (assume each student has a unique email address): – Name is a non-key attribute: – Email is a candidate key: – Name is a candidate key:– Name depends on ID: – Name depends on Email :– Email depends on AveGPA :– Nationality depends on Email :– AveGPA depends on Name :
Student
16
Representing Functional Dependencies
Name ID Email AveGPA Nationality
• Graphical representation: a link is issued from the determinant pointing to the other attribute– For example: ID is the determinant, and Name is
functionally dependent on ID
• Text representation:– ID -> Name, AveGPA, Nationality– Email -> Name, AveGPA, Nationality
Name ID
Student
17
Partial Functional Dependency
• A function dependency in which one or more non-key attributes are functionally dependent on part (but not in all) of the primary key is defined as a partial functional dependency.
18
Fig 4.23(b) – Functional Dependencies in EMPLOYEE2
Dependency on entire primary key
Dependencies on part of the key (partial functional dependencies)
EmpID Name, DeptName, Salary
Partial dependencies => NOT in 2nd Normal Form!!
EmpID, CourseID DateCompleted
CourseID Course_Title
19
Getting it into 2nd Normal Form
• Decompose the relation into three separate relations
EmpID SalaryDeptNameName
CourseID DateCompletedEmpID
No partial functional dependencies
Emp
Emp_Course
CourseTitleCourseIDCourse
20
Third Normal Form
• 2NF PLUS no transitive dependencies
• A transitive dependency is a functional dependency between two (or more) non-key attributes.
21
Example -- Relation with transitive dependency
(a) Customer_Order Relation with simple data
Order_ID Order_Date Customer_ID Customer_Name Customer_Address
001 09/30/2004 C1 Jack 15145 S.W. 17th St
002
003
004
005
09/30/2004
04/04/2005
07/21/2005
08/13/2005
C1
C2
C1
C2
Jack
Jack
Mary
Mary
15145 S.W. 17th St
15145 S.W. 17th St
1900 Allard Ave
1900 Allard Ave
22
Example -- Relation with transitive dependency
Order_ID Order_DateOrder_ID Customer_IDOrder_ID Customer_NameOrder_ID Customer_Address
All this is OK(2nd NF)
BUT
Customer_ID Customer_NameCustomer_ID Customer_Address
Transitive Dependency: Not 3rd NF
Order_ID Order_Date Customer_ID Customer_Name Customer_Address
CUSTOMER_ORDER
23
Now, there are no transitive dependencies…Both relations are in 3rd NF
Order_ID Customer_ID
Customer_ID Customer_Name, Customer_Address
25
OrderID OrderDate, CustomerID, CustomerName, CustomerAddress
Therefore, NOT in 2nd Normal Form
ProductID ProductDescription, ProductFinish, UnitPrice
OrderID, ProductID OrderQuantity
OrderID OrderDateCustomer
IDCustomer
NameCustomerAddress
ProductIDProduct
DescriptionProductFinish
UnitPriceOrderedQuantity
2NF: Remove Partial Functional Dependency
26
Getting it into the Second Normal Form
No partial functional dependency, and all threerelations are in 2NF
Fig 4-28 Removing Partial Dependencies
3NF: Remove Transitive Dependency
Therefore, CUSTOMER_ORDER relation isNOT in 3rd Normal Form
CustomerID CustomerName, CustomerAddress
28
Transitive dependencies are removed.
Getting it into the Third Normal Form
Fig 4-29 Removing Transitive Dependencies
29
Product_ID
Product_Description Product_Finish Standard_Price Product_Line_Id
1 End Table Cherry $175.00 1
2 Coffer Table Natural Ash $200.00 2
3 Computer Desk Natural Ash $175.00 2
4 Entertainment Center Walnut $650.00 3
5 Writers Desk Cherry $325.00 1
6 8-Drawer Desk White Ash $750.00 2
7 Dining Table Natural Ash $800.00 2
8 Computer Desk Walnut $500.00 3
Find all the function dependencies.
Is this in 2NF? If not, get it into 2NF.
Is this in 3NF? If not, get it into 3NF.
In Class Exercise 5.7
PRODUCT
30
Data Normalization Summary
• 1st Normal Form– no multivalued attributes, and every attribute value is
atomic– All relations are in 1st Normal Form
• 2nd Normal Form– 1NF + every non-key attribute is fully functionally
dependent on the ENTIRE primary key– Decomposing the relation into two new relations
• 3rd Normal Form– 2NF PLUS no transitive dependencies– Decomposing the relation into two new relations
31
Other Normal Forms (from Appendix B)
• Boyce-Codd NF– All determinants are candidate keys…there is no
determinant that is not a unique identifier
• 4th NF– No multivalued dependencies
• 5th NF – No “lossless joins”
• Domain-key NF– The “ultimate” NF…perfect elimination of all
possible anomalies
33
In Class Exercise 5.9• The structure and sample data are provided for following table. Break it into
relations in 3NF (assumption: Dept_Manager must be a Emp who has a unique Emp_Code, and Emp_Educ is a multi-valued attribute).
Emp_Code Emp_Name Emp_Educ Dept_Code Dept_Name Dept_Manager
Job_Class Job_Title Emp_DoB Emp_Hire_Date Job_Base_Salary
1003 Willaker BS, MBA MKTG Marketing 1012
23 Sales Agent 10/23/1968 10/14/1997 $32,255
Continued...
Transitive Dependency
Transitive Dependency
Employee
34
In Class Exercise 5.10
• Consider the following relation definition and sample data:PROJECT (ProjectID, EmployeeName, EmployeeSalary)– ProjectID: name of a
work project– EmployeeName: name
of an employee– EmployeeSalary: salary
of the employee
ProjectID EmployeeName
EmployeeSalary
100A Jones 64K
100A Smith 51K
100B Smith 51K
200A Jones 64K
200B Jones 64K
200C Parks 28K
200C Smith 51K
200D Parks 28K
PROJECT
35
In Class Exercise (cont’d)
• Assuming that all of the functional dependencies and constraints are apparent in this data, which of the following statements is true?– ProjectID EmployeeName– ProjectID EmployeeSalary– (ProjectID, EmployeeName)
EmployeeSalary– EmployeeName EmployeeSalary– EmployeeSalary ProjectID– EmployeeSalary (ProjectID,
EmployeeName)
36
In Class Exercise (cont’d)
• What is the key of PROJECT?• Are all non-key attributes (if any)
dependent on all of the key?• In what normal form is PROJECT?• Describe one modification anomaly from
which PROJECT suffers.• Is ProjectID a determinant?• Is EmployeeName a determinant?
37
In Class Exercise (cont’d)
• Is (ProjectID, EmployeeName) a determinant?
• Is EmployeeSalary a determinant?
• Does this relation contain a partial dependency? If so, what is it?
• Redesign this relation to eliminate the modification anomalies.
38
What we have learned
• Key concepts:– Functional dependency– Partial functional dependency– Transitive dependency
• Data normalization– 1NF: must be a relation– 2NF: 1NF + no partial functional dependency– 3NF: 2NF + no transitive dependency