Upload
edwina-bishop
View
235
Download
0
Embed Size (px)
Citation preview
1
NORMALIZATIONPart 1: The
Concept
2
Objectives How to undertake the normalization process. How normalization uses functional dependencies to
group attributes into relations that are in a known normal form.
How to identify the most commonly used normal forms: First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF)
The problems associated with relations that break the rules of 1NF, 2NF, or 3NF.
How to represent attributes shown on a form as 3NF relations using normalization.
3
Introduction
2 strategies in creating data model: We have adopted a top-down approach to
database design that begins by identifying the entities and relationship.
Normalization is a bottom-up approach to database design that begins by examining the relationships between attributes.
4
Database Design Strategies
Two classical approaches/strategies to dB design:
Conceptual Model
Entity
Attribute Attribute
Entity
Attribute Attribute
Top-down Bottom-up{ Normalization }
5
Table & Relational Schema
staffNO sName position salary branchNO
S21 Johan Manager 3000 B005
S37 Ana Assistant 1200 B003
S14 Daud Supervisor 1800 B003
S9 Mary Assistant 900 B007
S5 Siti Manager 2400 B003
S41 Jani Assistant 900 B005
branchNO bAddress
B005 123, Kepong
B007 456, Nilai
B003 789, PTP
staff branch
staff (staffNO, sName, position, salary, *branchNO)
branch (branchNO, bAddress)
6
Normalization
When we design a database, the main objective is to create an accurate representation of data, relationship between the data, and constraints on the data that is relevant.
To achieve this objective, we have to identify suitable set of relations (table) by creating good table structure/process of assigning attributes to entities. The process is known as Normalization. Process for evaluating and correcting table
structures to minimize/control data redundancies {reduces data anomalies}.
Works through a series of stages called normal forms.
7
NormalizationThe most commonly used normal forms:
First Normal Form(1NF) Second Normal Form (2NF) Third Normal Form (3NF)
1NF < 2NF < 3NF
Highest normalization is not always desirableMore JOINS are requiredAffect data retrieval performance/high response
time For most business database design purposes, 3NF
is as high as we need to go in normalization process
8
Normalization Every normal form is based on functional
dependencies between attributes in a relationship.
Each relationship can be normalized into a specific form to avoid anomalies. Anomalies?
Anomaly = abnormality Ideally a field value change, should be made
only in a single place. Data redundancy, promotes an abnormal
condition by forcing field value changes in many different locations.
Insertion anomalies Deletion anomalies Modification/Update anomalies
9
Functional Dependencies
An important concept associated with normalization is functional dependency which describes the relationship between attributes.
In this section, you will learn about functional dependency and then focus on the particular characteristics of functional dependency that are useful for normalization.
10
Functional Dependencies
Functional dependency can be divided into two types: Full functional dependency/Partial
dependency (PD) Will be used to transform 1NF 2NF
Transitive dependency (TD) Will be used to transform 2NF 3NF
11
Functional Dependencies
Multivalued Attributes (or repeating groups): non-key attributes or groups of non-key attributes the values of which are not uniquely identified by (directly or indirectly) (not functionally dependent on) the value of the Primary Key (or its part).
1st row
2nd row
Relational SchemaSTUDENT(Stud_ID, Name, (Course_ID, Units))
12
Functional Dependencies
Partial Dependency – when an non-key attribute is determined by a part, but not the whole, of a COMPOSITE primary key (The Primary Key must be a Composite Key).
Cust_ID → Name
13
Functional Dependencies
Transitive Dependency – when a non-key attribute determines another non-key attribute.
Dept_ID → Dept_Name
14
Functional Dependencies
Consider a relation with attributes A and B, where attribute B is functionally depends on attribute A. Let say an A is a PK of R.
To describe the relationship between attributes A and B is to say that “A functionally determines B”.
A BB is functionallydepends on A
R(A,B)A B
15
Functional Dependencies
When a functional dependency exist, the attribute or group of attributes on the left-handed side of the arrow is called determinant.
Determinant:
Refers to the attributes, or a group of attributes, on the
left handed side of the arrow of a functional dependency.
A BA functionally
determines B
16
staffNO sName position salary branchNo
S21 Johan Manager 3000 B005
S37 Ana Assistant 1200 B003
S14 Daud Supervisor 1800 B003
S9 Mary Assistant 900 B007
S5 Siti Manager 2400 B003
S41 Jani Assistant 900 B005
branchNO bAddress
B005 123, Kepong
B007 456, Nilai
B003 789, PTP
staff
branch
Functional Dependencies
Determinant
17
Functional Dependencies Consider the attributes staffNO and position of
the staff relation. For a specific staffNO (S21), we can determine
the position of that member of staff as Manager. staffNO functionally determines position.
Staff number (S21) Position (manager)
staffNO positionposition is functionally
depends on staffNO
18
Functional Dependencies
However the next figure illustrate that the opposite is not true, as position does not functionally determines staffNO.
A member of staff holds one position; however, they maybe several members of staff with the same position.
Position(manager)staff number (S21)
staff number (S5)
position staffNOstaffNO does not functionally
depends on position
19
Partial Dependencies: Full functional dependency indicates that if A and B
are attributes of a relation, B is fully functionally dependent on A, if B is functionally dependent on A, but not on any proper subset of A.
staff(staffNO,sName,position,salary,branchNO)
staffNO, staffName branchNO
True!!! each value of (staffNO, sName) is associated with a single value of branchNO.
however, branchNO is also functionally dependent on staffNO.
Functional Dependencies
20
Transitive Dependencies:
staff(staffNO,sName,position,salary,*branchNO)branch(branchNO,bAddress)
staffNO sName,position,salary,branchNO,bAddress
branchNO bAddress
True for transitive dependency!!! branchNO → bAddress
exists on staffNO via branchNO
Functional Dependencies
21
Normalization Process
Formal technique for analyzing relations based on their Primary Key (or candidate keys) and functional dependencies.
The technique executed as a series of steps (stage). Each step corresponds to a specific normal form, that have specific characteristic.
As normalization proceeds, the relations become progressively more restricted (stronger) in format and also less vulnerable to anomalies.
Data Redundancies
0NF/UNF
1NF
2NF
3NF
22
Normalization Process
Relationship between Normalize FormDenormalization
Normalization
Figure 1: Diagrammatic illustration of the relationship between the normal forms
23
Normalization Process
Users Users’ requirements specification
Forms/reports that are used or generated
by the enterprise
Sources describing the enterprise such as data
dictionary and corporate data model
Unnormalized Form (UNF)
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Transfer attributes into table format
Remove repeating group
Remove partial dependencies
Remove transitive dependencies
Data Sources
24
Normalization Process
2NF
3NF
UNF1)Repeat Group
2)PK is not defined
1NF1)Remove Repeat Group
2)Defined PK composite PK consist of attributes
Test for partial dependency
If (exist)
(1 Table)
Test for transitive dependency
If (exist)
(1 or 2 Tables)
(2 or 3 Tables)
(more then 1 table)
(3 or 4 Tables)
(a b …. TD) 1
(a ……. TD) 2
(b ….… TD) 3
(a, b x, y)
(a c, d)
(b z)
(c d)
Normalization Process Relation/Table Format
- Have repeating group-PK not defined
- No repeating group-PK defined-Test partial dependency
- No repeating group-PK defined-No partial dependency-Test transitive dependency
- No repeating group-PK defined-No partial dependency-No transitive dependency
25
Normalization Process
Remember this!!! Unnormalized (UNF): There are multivalued
attributes or repeating groups, not CK (or PKs) 1NF: NO multivalued attributes or repeating
groups, has CK (or PKs) 2NF: 1NF with NO Partial Dependencies (PD) 3NF: 2NF with NO Transitive Dependencies (TD)
26
Unnormalized Form (UNF)
To create an unnormalized table Transform the data from the information
source (e.g. form) into table format with columns and rows.
Unnormalized Form = A table that contains one or more repeating UNF/0NF groups
27
First Normal Form (1NF)
A relation is in 1NF if every attribute for every tuple have a value and domain for each attribute can not be simplified anymore.
First Normal Form = A relation in which the intersection of each row 1NF and column contains one and only one value.
28
Transforming UNF to 1NF
STUDENT
Stud_ID Name Course_ID Units
101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00
STUDENT
Stud_ID Name Course_ID Units
101 Lennon MSI 250 3.00
MSI 415 3.00
125 Johnson MSI 331 3.00
UNF
1NF
To-do list:1. Remove
repeating groups
2. Identify Composite Key
1
2
3
2
1
29
Example 1: Determine UNF
30
Example 1: Determine UNF
31
Example 1: Determine UNF
32
Example 1: Determine UNF
33
Example 1: Determine UNF
34
Example 2: Determine UNF
35
Example 2: Determine UNF
36
Example 2: Determine UNF
37
Example 2: Determine UNF
38
Example 3: Determine UNF
39
Example 3: Determine UNF
40
UNF to 1NF
Nominate an attribute or group of attributes to act as the key for the unnormalized table.
Identify the repeating group(s) in the unnormalized table which repeats for the key attribute(s).
Step 1: Eliminate the Repeating Groups Eliminate nulls: each repeating group attribute
contains an appropriate data value Step 2: Identify the Primary Key
Must uniquely identify attribute value New key must be composed
Step 3: Identify All Dependencies Dependencies are depicted with a diagram
41
UNF to 1NF
Relational Schema
STUDENT(Stud_ID, Name, (Course_ID, Units))
42
UNF to 1NF
Key Attribute: Stud_ID, Course_IDRepeating Group: (Course_ID, Units)
43
UNF to 1NF
Relational Schema
STUDENT(Stud_ID, Name, Course_ID, Units)
Ensure a single value at the intersection of each row and column Enter appropriate student data in each row
44
UNF to 1NF
Key Attribute: Stud_ID, Course_IDRepeating Group: (Course_ID, Units)
45
UNF to 1NF
46
UNF to 1NF
Dependency diagram: Depicts all dependencies found within given
table structure Helpful in getting bird’s-eye view of all
relationships among table’s attributes Makes it less likely that you will overlook an
important dependency
47
UNF to 1NF
Remove the repeating group by Entering appropriate data into the empty
columns of rows containing the repeating data Fill the blanks by duplicating the non
repeating data, where required. This approach is commonly referred to as
”flattening table”. This approach will produce redundancy in a
relationship, but it can be eliminated in higher normalization process.
48
UNF to 1NF
Example: DreamHome Case Study
A collection of DreamHome leases (rent) form is shown in Figure 2. The lease on top is for a client called Rannia who is leasing a property in Skudai, Johor, which is owned by Dollah. For this worked example, we assume that a client rents a given property only once and cannot rent more than one property at any one time.
Sample data is taken from two leases for two different clients called Rannia and Ahmad and is transformed into table format with rows and columns, as shown in Figure 3. This is an example of unnormalized table.
49
UNF to 1NF
AlamatRumah
MulaSewa
TamatSewa
HargaSewa
NoPemilik
Nama Pemilik
PG04 PG16
Subang Jaya, Selangor.
Pasir Gudang, Johor.
1/7/93 1/9/00
31/8/00 1/9/01
750 850
C040 C093
Karim Fendi Kasim Selamat
HouseNo
Page 2 DREAMHOUSE LEASE Date : 28/02/2007 Client Rental Information
Client Name : Ahmad Client Number : CR56
HouseAddressHouseAddress
RentStartRentStart
RentFinishRentFinish
MonthlyRentMonthlyRent
OwnerNo
OwnerName
PG04 PG16
Skudai, Johor
Ampang, Selangor
1/7/93 1/9/00
31/8/00 1/9/01
750 850
C040 C093
Dolah
Abdullah
HouseNo
Page 1 DREAMHOUSE LEASE Date : 28/02/2007 Client Rental Information
Client Name : Rannia Client Number : CR76
Figure 2 : Collection of Dream Home leases (rent) form
50
UNF to 1NF
Key attribute: clientNoRepeating group in the unnormalized table as the property rented details, which repeat for each client:Repeating Group: ( houseNo, houseAdd, rentStart, rentFinish, rent, ownerNo, oName)
clientNo
CR76
CR56
cName houseNo
houseAdd
rentStart
rentFinish
rent ownerNo
oName
Rannia PG04
PG16
Skudai, Johor.
Ampang,Selangor
1/7/93
1/9/00
31/8/00
1/9/01
750
850
C040
C093
Dolah
Abdullah
Ahmad PG04 PG 36
PG 16
Skudai, Johor.
Kuantan, Pahang.
Ampang, Selangor.
20/3/90 21/6/93
25/1/00
19/6/93 3/1/00
30/8/00
750 1000
850
C040 C093
C093
Dolah
Abdullah
Abdullah
Figure 3 : ClientRental UNF
51
UNF to 1NF
As a consequences, there are multiple value at the intersection of certain rows and columns.
For examples, there are two value for houseNo (PG4 and PG16) for the client Rannia.
To transform an unnormalized table into 1NF, we ensure that there is a single value at the intersection of each row and column. This is achieved by removing the repeating group.
With this approach, remove the repeating group (house rented details) by entering the appropriate client data into each row.
The resulting First Normalize Form (1NF) ClientRental relation is shown in Figure 4.
Key attribute: clientNoRepeating group in the unnormalized table as theproperty rented details, which repeat for each client:Repeating Group: (houseNo, houseAdd, rentStart, rentFinish, rent, ownerNo, oName)
52
UNF to 1NF
clientNo
CR76
CR76
CR56
CR56
CR56
cNamehouseNo
houseAdd
rentStart
rentFinish
rent ownerNo
oName
Rannia
Rannia
PG04
PG16
Skudai, Johor.
Ampang,Selangor
1/7/93
1/9/00
31/8/00
1/9/01
750
850
C040
C093
Dolah
Abdullah
Ahmad
Ahmad
Ahmad
PG04 PG 36
PG 16
Skudai, Johor.
Kuantan, Pahang.
Ampang, Selangor.
20/3/90 21/6/93
25/1/00
19/6/93 3/1/00
30/8/00
750 1000
850
C040 C093
C093
Dolah
Abdullah
Abdullah
Figure 4 : 1NF ClientRental relation
ClientRental (clientNo, houseNo, cName, houseAdd, rentStart, rentFinish, rent, ownerNo, oName)
Primary Key for the ClientRent relation is a composite key that are clientNo and houseNo
ClientRent relation is in 1NF as there is a single value at the intersection of each row and columns.
53
UNF to 1NF
Relationship in Figure 4 contains data that describes client, house for rent and owner of the house, which is repeated for several times.
As a consequences, the ClientRental relation contains significant data redundancy.
If implemented, the 1NF relation would be subject to the update anomalies.
To remove some of these, transform 1NF 2NF
54
Borrow(bID, bDATE,mID,mNAME,mADD,ISBN,title,rDATE)
Functional dependencies:bID, ISBN bDATE (PK)bID mID,mNAME,mADD,rDATE
(PD)ISBN title (PD)mID mNAME,mADD
(TD)
55
INSURANCE(cNO,cNAME,cADD,pNO,pTYPE,
pDATE,purchaseDATE,pPRICE,aNO,aNAME,aTEL)
Functional Dependencies:cNO,pNO,aNO purcahseDATE (PK)cNO cNAME,cADD (PD)pNO pTYPE,pDATE,pPRICE (PD)aNO aNAME,aTEL (PD)
56
RECEIPT(rNO,date,tNO,sID,sNAME,iNO,mNO,mDES,qty,price,amt,total,pay,balance)
Functional Dependencies:
rNO,mNO qty, iNO, amt (PK)
rNO date,tNO,sID,sNAME,total,pay,balance (PD)
mNO mDES,price (PD)
sID sNAME (TD)
57
Second Normal Form (2NF)
Based on the concept of partial dependency. Dependencies based on only a part of composite primary key
2NF applies to relations with composite keys, that is, relations with PK composed of two or more attributes.
A relation with a single-attribute PK is automatically in at least 2NF.
Second Normal Form = A relation that is in 1NF and every non-PK 2NF attribute is fully functionally depends on the PK.
58
CONTINUE TO NEXT PRESENTATION
TRANSFORMING UNF TO 2NF