Data Normalization
Database Management Systems Reading: Hoffer Chapter 4 Dr. Wingyan Chung
2
In logical DB design … n Relations may not be well-structured, e.g.,
n ORDERED_PRODUCT (Order_Line_ID, Product_ID, ProductName, ProductPrice, StandardPrice)
n Unnecessary duplication in attribute values n StandardPrice vs. ProductPrice
n Some attributes’ values determine other attributes’ values
n ProductID -> ProductPrice n Multiple themes exist in a relation, causing anomalies
in insertion, deletion, and update n CustomerOrder (CustomerID, Name, OrderID,
OrderDate, ProductOrdered, Quantity)
3
Data Normalization n Validates and improves a logical DB
design n Decomposes relations that have
anomalies to produce smaller, well-structured relations
n Different levels of normalization are achieved n Our focus: First, second, third normal
forms
Example – Figure 4.2b
n Insertion anomaly – can’t enter a new employee without having the employee take a class
n Deletion anomaly – if we remove employee 140, we lose information about the existence of a Tax Acc class
n Modification anomaly – giving a salary increase to employee 100 forces us to update multiple records 4
Problem: table is not atomic
n Is it a relation? n No (each cell of a relation must have only
one value) 5
An improved version
n No multivalued attributes n Every attribute value is atomic n Fig. 4-25 is not in 1st Normal Form (1NF) è it is not a
relation n By definition, all relations are in 1st Normal Form (no multivalued
attributes) n Fig. 4-26 is in 1NF, but still not well-structured 6
7
Anomalies in this Table n Insertion – if new product is ordered for
order 1007 of existing customer, customer data must be re-entered, causing duplication
n Deletion – if we delete the Dining Table from Order 1006, we lose information concerning this item's finish and price
n Update – changing the price of product ID 4 requires update in several records
Why do anomalies exist? n Multiple themes (entity types) exist in one
relation, causing unnecessary dependencies among attributes n Normally, all attributes are functionally dependent on
primary key only n Functional dependency = relationship between attributes
such that one attribute’s values are determined by the other attributes
n E.g., Emp_ID, Course_Title → Salary n (LHS = determinant; RHS = non-key attribute)
n Anomalies exist when some other attributes’ values depend on values of non-PK attributes or only part of PK n Partial functional dependency
8
Partial functional dependency n Examples of partial functional
dependencies are n Product_ID è Product_Description n Product_ID è Unit_Price n Order_ID è Order_Date
n Must remove partial FD for a relation to be in second normal form (2NF) n 2NF = 1NF PLUS every non-key attribute is
fully functionally dependent on the ENTIRE primary key (not just some components of PK)
9
10
Order_ID è Order_Date, Customer_ID, Customer_Name, Customer_Address
Therefore, NOT in 2nd Normal Form
Customer_ID è Customer_Name, Customer_Address Product_ID è Product_Description, Product_Finish, Unit_Price Order_ID, Product_ID è Order_Quantity
Figure 4-27 Functional dependency diagram for INVOICE
Getting the relations to 2NF
n Partial Dependencies are removed, but there are still transitive dependencies n Non-key attributes determine values of some
other non-key attributes, e.g., n Order_ID è Customer_ID è Customer_ Address
11
Figure 4-28 Removing partial dependencies
12
Transitive Dependency
n Examples of anomaly n Insertion – Must duplicate John Doe’s data
if he places more order n Deletion – Permanently loses Mary Smith’s
data if order 1004 is canceled n Update – Need to update multiple records
when John Doe changes his address
Order_ID Order_Date Customer_ID Customer_NameCustomer_Address1001 10/22/2005 501 John Doe 100 Mesa St.1002 10/23/2005 501 John Doe 100 Mesa St.1003 10/24/2005 501 John Doe 100 Mesa St.1004 10/24/2005 504 Mary Smith 200 Sun Dr.1005 10/24/2005 505 Susan Young 5243 Hill Blvd.
Getting the relations to 3NF
n 3NF = 2NF PLUS no transitive dependencies (no functional dependencies on non-PK attributes) n Non-key determinant with transitive dependencies go
into a new table n Non-key determinant becomes primary key in the new
table and stays as foreign key in the old table
13
Figure 4-29 Removing partial dependencies
Getting it into Third Normal Form
Merging Relations n View Integration–Combining entities from
multiple ER models into common relations n Issues to watch out for when merging entities
from different ER models: n Synonyms–two or more attributes with different
names but same meaning n Homonyms–attributes with same name but different
meanings n Transitive dependencies–even if relations are in
3NF prior to merging, they may not be after merging
n Supertype/subtype relationships–may be hidden prior to merging
14
15
Figure 4-31 Enterprise keys
a) Relations with enterprise key
b) Sample data with enterprise key
• Primary keys that are unique in the whole database, not just within a single relation
• Corresponds with the concept of an object ID in object-oriented systems
16
Summary n Anomalies exist when attribute values in a table are
determined by non-PK attributes or only part of PK n The table is in 1NF (i.e., a relation) if it contains no
multivalued attribute n The relation is in
n 2NF if all non-key attributes are determined by the entire PK (not part of it) – i.e., no partial functional dependencies
n 3NF if all non-key attributes are determined only by the PK (not other non-PK attributes) – i.e., no transitive dependencies
n Solution: Decomposing large relations into smaller relations n Remove partial and transitive dependencies n Possibly with FK referencing to parent relations