35
Normalization Rules for Database Tables Northern Arizona University College of Business Administration

Normalization Rules for Database Tables Northern Arizona University College of Business Administration

Embed Size (px)

Citation preview

Page 1: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

Normalization Rules for Database Tables

Northern Arizona University

College of Business Administration

Page 2: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

Normalization - Some Definitions

• a relation is a two-dimensional array with a single-valued entry in each cell which has no duplicate rows and has columns whose meaning is the same across all rows. – All tables used in the relational model must be relations.

• Normalization is a process for evaluating table structure and reorganizing them as needed to product a set of stable, well-structured relations.

• An anomaly is a condition which interferes with the storage, or retention of data or creates the potential for inconsistent data. – There are insertion, modification, and deletion anomalies.

– The Normalization process should eliminate anomalies.

Page 3: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

Unnormalized Tables

• An Unnormalized table is a table that does not meet the definition of a relation.

– it contains rows with multiple values for an attribute (repeating groups) or

– contains duplicate rows.

• A table is said to be in first normal form if it meets the definition of a relation – Generally this means it contains no repeating groups of attributes.

The next slide shows an example of an unnormalized table.

Page 4: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

E ID# E Name Dept. Hire Date Skill34567 J. Jones MKT 8/12/1992 Feng Shui

Origami

19203 A. Davis PROD 7/18/1987 Judo

29073 B. Evans MKT 9/2/1995

46072 L. Adams ACC 11/17/1992 JudoOrigami

52051 S. Smith PROD 1/28/1996 PhotoShop

EMPLOYEE

This EMPLOYEE table is unnormalized - It has cells that do not contain single-valued entries.

As shown this table has no logical primary key. The E ID# does not functionally determine the value of Skill.

Page 5: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

E ID# E Name Dept. Hire Date Skills Skill 1 Skill 2 Skill 334567 J. Jones MKT 8/12/1992 2 Feng Shui Origami19203 A. Davis PROD 7/18/1987 1 Judo K. Davis29073 B. Evans MKT 9/2/1995 046072 L. AdamsACC 11/17/1992 2 Judo Origami52051 S. Smith PROD 1/28/1996 1 PhotoShop K. Smith

EMPLOYEE

The above employee table shows the same set of data as the previous slide. It has been reorganized into a form that could be implemented under some file processing systems, using COBOL, for instance.

However, it is still not in a form that can be used by the relational model. The Skills are a multi-valued (repeating group) of attributes which cannot be identified by the primary key.

Page 6: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

Eliminating Repeating Groups

• In most cases, Unnormalized tables can be converted to Sets of Tables that are in at least First Normal form by:

Original Table

EMPLOYEEE ID# E Name Dept. Hire Date Skill34567 J. Jones MKT 8/12/1992 Feng Shui

Origami

19203 A. Davis PROD 7/18/1987 Judo

29073 B. Evans MKT 9/2/1995

46072 L. Adams ACC 11/17/1992 JudoOrigami

52051 S. Smith PROD 1/28/1996 PhotoShop

placing any repeating groups of fields in a separate table which includes the primary key attribute from the original table along with a single occurrence of the repeating attribute (Skill in our example).A Table is in first normal form if it contains no multi-valued attributes

Page 7: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

Eliminating Repeating Groups

E ID# E Name Dept. Hire Date Skill34567 J. Jones MKT 8/12/1992 Feng Shui

Origami

19203 A. Davis PROD 7/18/1987 Judo

29073 B. Evans MKT 9/2/1995

46072 L. AdamsACC 11/17/1992 JudoOrigami

52051 S. Smith PROD 1/28/1996 PhotoShop

EMPLOYEE

E ID# E Name Dept. Hire Date34567 J. Jones MKT 8/12/199219203 A. Davis PROD 7/18/198729073 B. Evans MKT 9/2/199546072 L. AdamsACC 11/17/199252051 S. Smith PROD 1/28/1996

EMPLOYEE

Original

Normalized

E ID# Skill34567 Feng Shui34567 Origami19203 Judo46072 Judo46072 Origami52051 PhotoShop

EMPLOYEE_SKILL

Page 8: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

Logical ER Diagram in ER Studio Notation

• 1st Normal Form Example

Page 9: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

This Table is in First Normal Form

Course # Section Descr. Time Room Cr. Hrs.CIS 120 1 Intro to CIS 8:00 MWF 200 4CIS 220 2 VB Prog. 9:35 TTh 207 3ACC 255 1 Fin. Acct. 9:10 MWF 207 2CIS 220 1 VB Prog. 8:00 MWF 203 3CIS 120 2 Intro to CIS 10:20 MWF 200 4

Schedule of Classes

There are no repeating groups of attributes. NOTE: The primary key of this table is a Concatenated key - no single attribute uniquely identifies a row of the table, but the Combination of Course # and Section # does uniquely identify a row. If I know that the Course is CIS 120 and the Section is section 1, I can identify a unique schedule occurrence.

Page 10: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

Although this Table is in First Normal Form,it Contains Anomalies

Course # Section Descr. Time Room Cr. Hrs.CIS 120 1 Intro to CIS 8:00 MWF 200 4CIS 220 2 VB Prog. 9:35 TTh 207 3ACC 255 1 Fin. Acct. 9:10 MWF 207 2CIS 220 1 VB Prog. 8:00 MWF 203 3CIS 120 2 Intro to CIS 10:20 MWF 200 4

Schedule of Classes

If the description of CIS220 Changes from VB Prog. to Visual C#, I must record the new value in two places (as shown) - This is a modification anomaly

Page 11: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

Although this Table is in First Normal Form,it Contains Anomalies

Course # Section Descr. Time Room Cr. Hrs.CIS 120 1 Intro to CIS 8:00 MWF 200 4CIS 220 2 VB Prog. 9:35 TTh 207 3ACC 255 1 Fin. Acct. 9:10 MWF 207 2CIS 220 1 VB Prog. 8:00 MWF 203 3CIS 120 2 Intro to CIS 10:20 MWF 200 4

Schedule of Classes

If a new course has been designed and I know its description and credit hours (ACC 266, Pers. Acc., 2 hrs), I still cannot record this data until at least one section of the course is offered - an insertion anomaly.

?

Page 12: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

A Table in First Normal FormContaining Anomalies

Course # Section Descr. Time Room Cr. Hrs.CIS 120 1 Intro to CIS 8:00 MWF 200 4CIS 220 2 VB Prog. 9:35 TTh 207 3ACC 255 1 Fin. Acct. 9:10 MWF 207 2CIS 220 1 VB Prog. 8:00 MWF 203 3CIS 120 2 Intro to CIS 10:20 MWF 200 4

Schedule of Classes

If no section of ACC 255 is offered this semester, I will lose the information about the description and credit hours of this course. - A deletion anomaly

Page 13: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

Course # Section Descr. Time Room Cr. Hrs.CIS 120 1 Intro to CIS 8:00 MWF 200 4CIS 220 2 VB Prog. 9:35 TTh 207 3ACC 255 1 Fin. Acct. 9:10 MWF 207 2CIS 220 1 VB Prog. 11:10 TTh 203 3CIS 120 2 Intro to CIS 10:20 MWF 200 4

Schedule of Classes

This table has anomalies because it contains partial dependencies.

A partial dependency occurs when one or more attributes in a table depends upon (is functionally determined by) only a portion of a concatenated primary key.

In this case the Description and Cr. Hrs. attributes depend only on Course #. To correct this problem, those attributes determined by only a part of the key should be placed in a separate table. Its Primary key will be the portion of the original primary key required to identify them.

Page 14: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

Course # Section Descr. Time Room Cr. Hrs.CIS 120 1 Intro to CIS 8:00 MWF 200 4CIS 220 2 VB Prog. 9:35 TTh 207 3ACC 255 1 Fin. Acct. 9:10 MWF 207 2CIS 220 1 VB Prog. 11:10 TTh 203 3CIS 120 2 Intro to CIS 10:20 MWF 200 4

Schedule of Classes

Original

Course # Section Time RoomCIS 120 1 8:00 MWF 200CIS 220 2 9:35 TTh 207ACC 255 1 9:10 MWF 207CIS 220 1 11:10 TTh 203CIS 120 2 10:20 MWF 200

Schedule of Classes

Course # Descr. Cr. Hrs.CIS 120 Intro to CIS 4CIS 220 VB Prog. 3ACC 255 Fin. Acct. 2

COURSE

Revised

Notice how this structure eliminates the anomalies we found

Page 15: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

Logical ER Diagram in ER Studio Notation

• 2nd Normal Form Example

Page 16: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

Second Normal Form

• Partial Dependencies occur when nonkey attributes are functionally determined by only a portion of a concatenated primary key.

• Partial dependencies can occur only in tables with a concatenated key.

• Partial dependencies can be corrected by removing those attributes to a separate table whose primary key is just the portion of the key from the original table needed to functionally determine them.

• A table is in second Normal Form if it is in first normal form and it contains no partial dependencies.

Page 17: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

A Table in Second Normal Form Which Has Anomalies

Prof. # P. NameDept Code Office #

Dept Aide

Dept Fax #

B17 B. Brown ECO 263 D. Davis 523-7441J12 J. Jones ACC 243 W. Smith 523-7318M22 M. Morris ECO 257 D. Davis 523-7441L29 L. Lawless CIS 222 C. Coles 523-7318L22 L. Lewis ACC 248 W. Smith 523-7318W26 W. West ECO 257 D. Davis 523-7441

PROFESSOR

This table is in 2nd normal form since it has no repeating groups of attributes(first normal form) and its primary key is not concatenated.However, the table above still has anomalies.

Page 18: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

Anomalies in the Example Professor TablePROFESSOR

Modification Anomalies -if the Dept Aide serving the ECO department changes, or if the Fax # of the ECO department changes, this change would need to be made in several records.

Insertion Anomalies - I want to start a new department and have a Dept Code, a Dept Aide, and a Dept Fax # (e.g., MKT, T. Taylor, 523-7216). I can’t add this data to the table until at least one professor is hired to teach in this new department.

Prof. # P. NameDept Code Office #

Dept Aide

Dept Fax #

B17 B. Brown ECO 263 D. Davis 523-7441J12 J. Jones ACC 243 W. Smith 523-7318M22 M. Morris ECO 257 D. Davis 523-7441L29 L. Lawless CIS 222 C. Coles 523-7318L22 L. Lewis ACC 248 W. Smith 523-7318W26 W. West ECO 257 D. Davis 523-7441

Page 19: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

Anomalies in the Example Professor TablePROFESSOR

Deletion Anomalies - If Prof # L29 (the only professor in the CIS department in our example table) is deleted, we would lose the information about the name of the Dept Aide for CIS and the Dept Fax # for CIS.

Prof. # P. NameDept Code Office #

Dept Aide

Dept Fax #

B17 B. Brown ECO 263 D. Davis 523-7441J12 J. Jones ACC 243 W. Smith 523-7318M22 M. Morris ECO 257 D. Davis 523-7441L29 L. Lawless CIS 222 C. Coles 523-7318L22 L. Lewis ACC 248 W. Smith 523-7318W26 W. West ECO 257 D. Davis 523-7441

Page 20: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

This Professor Table has Transitive DependenciesPROFESSOR

Prof. # P. NameDept Code Office #

Dept Aide

Dept Fax #

B17 B. Brown ECO 263 D. Davis 523-7441J12 J. Jones ACC 243 W. Smith 523-7318M22 M. Morris ECO 257 D. Davis 523-7441L29 L. Lawless CIS 222 C. Coles 523-7318L22 L. Lewis ACC 248 W. Smith 523-7318W26 W. West ECO 257 D. Davis 523-7441

The anomalies we have found occur because the Professor table has transitive dependencies.Dept Code, Dept Aide, and Dept Fax # are all attributes of a DEPARTMENT entity which is uniquely identified by Dept Code - If I know Dept Code I can uniquely identify Dept Aide and Dept Fax #.Knowing Prof # allows me to identify these attributes, but only through a chain of inferences - Prof # uniquely identifies Dept Code which, in turn uniquely identifies the other DEPARTMENT attributes.The anomalies can be resolved by removing the attributes determined by a non-key attribute to a separate table.

Page 21: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

Correcting Transitive Dependencies

Prof. # P. Name Dept. Office # D. Aide Fax #B17 B. Brown ECO 263 D. Davis 523-7441J12 J. Jones ACC 243 W. Smith 523-7318M22 M. Morris ECO 257 D. Davis 523-7441L29 L. Lawless CIS 222 C. Coles 523-6899L22 L. Lewis ACC 248 W. Smith 523-7318W26 W. West ECO 257 D. Davis 523-7441

PROFESSOR

Original

Prof. # P. Name Dept. Office #B17 B. Brown ECO 263J12 J. Jones ACC 243M22 M. Morris ECO 257L29 L. Lawless CIS 222L22 L. Lewis ACC 248W26 W. West ECO 257

Dept. D. Aide Fax #ECO D. DavisB. Brown523-7441ACC W. Smith 523-7318CIS C. Coles 523-6899

PROFESSORDEPARTMENT

Revised

Page 22: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

Logical ER Diagram in ER Studio Notation

• 3rf Normal Form Example

Page 23: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

Third Normal Form

• Transitive dependencies occur when non-key attributes are functionally determined by other non-key attributes.

• Transitive dependencies can be corrected by removing the attributes to a separate table whose primary key is the attribute of the original table which functionally determines them.

• The functionally determining attribute serves as a foreign key in the original table.

• A table is in Third Normal Form if it is in second normal form and it contains no transitive dependencies.

Page 24: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

A Proposed Normalization Process for Database Designers

Examine each table of the proposed structure and perform the following operations:

• Remove any repeating groups of attributes (multi-valued attributes) to a separate table. If there are independent sets of multi-valued attributes place each set in a separate table.

• Remove any attributes that are functionally determined by only a portion of a concatenated key to a separate table.

• Remove any attributes that are functionally determined by a non-key attribute to a separate table.

Page 25: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

Review Question:What Normalization rule(s) are violated by the table below?How would you revise the table Structure?

Emp No DateEmp. Name

Hours Worked

Emp Class

Wage Rate

Units Produced

103 9/27/2008 Jones 7.5 1 5.75$ 325101 9/27/2008 Downs 8 2 8.00$ 350102 9/27/2008 Eaves 6 1 5.75$ 415101 9/28/2008 Downs 4.5 2 8.00$ 315102 9/28/2008 Eaves 8 1 5.75$ 300104 9/28/2008 Smith 10 2 8.00$ 300

Write out your answer on a piece of scratch paper.

Page 26: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

Review Question Solution:

Original Table: This table violates both 2nd & 3rd normal forms.Emp. Name and Emp. Class both depend only on Emp No. Which is part of the concatenated key - violates 2nd normal form.Wage Rate is actually determined by Emp Class a non-key attribute which violates 3rd normal form.

Normalized TablesEmp. Class

Wage Rate

1 $5.752 $8.00

Emp NoEmp. Name

Emp. Class

103 Jones 1101 Downs 2102 Eaves 1104 Smith 2

Employee Class

EmployeeEmp No Date

Hours Worked

Units Produced

103 9/27/2008 7.5 325101 9/27/2008 8 350102 9/27/2008 6 415101 9/28/2008 4.5 315102 9/28/2008 8 300104 9/28/2008 10 300

Employee Hours

Emp No DateEmp. Name

Hours Worked

Emp Class

Wage Rate

Units Produced

103 9/27/2008 Jones 7.5 1 5.75$ 325101 9/27/2008 Downs 8 2 8.00$ 350102 9/27/2008 Eaves 6 1 5.75$ 415101 9/28/2008 Downs 4.5 2 8.00$ 315102 9/28/2008 Eaves 8 1 5.75$ 300104 9/28/2008 Smith 10 2 8.00$ 300

Page 27: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

Logical ER Diagram in ER Studio Notation

• Review Question Example

Page 28: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

Merging RelationsMerging Relations• View Integration–Combining entities from multiple ER View Integration–Combining entities from multiple ER

models into common relationsmodels into common relations• Issues to watch out for when merging entities from Issues to watch out for when merging entities from

different ER models:different ER models:– Synonyms–two or more attributes with different names but Synonyms–two or more attributes with different names but

same meaningsame meaning– Homonyms–attributes with same name but different meaningsHomonyms–attributes with same name but different meanings– Transitive dependencies–even if relations are in 3NF prior to Transitive dependencies–even if relations are in 3NF prior to

merging, they may not be after mergingmerging, they may not be after merging– Supertype/subtype relationships–may be hidden prior to Supertype/subtype relationships–may be hidden prior to

mergingmerging

28

Page 29: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

Enterprise KeysEnterprise Keys

• Primary keys that are unique in the whole Primary keys that are unique in the whole database, not just within a single relationdatabase, not just within a single relation

• Corresponds with the concept of an object ID Corresponds with the concept of an object ID in object-oriented systemsin object-oriented systems

29

Page 30: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

30

Figure 4-31 Enterprise keys

a) Relations with enterprise key

b) Sample data with enterprise key

Page 31: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

Mapping Unary RelationshipsMapping Unary Relationships

• One-to-Many–Recursive foreign key in the One-to-Many–Recursive foreign key in the same relationsame relation

• Many-to-Many–Two relations:Many-to-Many–Two relations:

– One for the entity typeOne for the entity type

– One for an associative relation in which One for an associative relation in which the primary key has two attributes, both the primary key has two attributes, both taken from the primary key of the entitytaken from the primary key of the entity

31

Page 32: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

32

Figure 4-17 Mapping a unary 1:N relationship

(a) EMPLOYEE entity with unary relationship

(b) EMPLOYEE relation with recursive foreign keyER StudioNotation

Page 33: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

33

Figure 4-17 Mapping a unary 1:N relationship

(a) EMPLOYEE entity with unary relationship

(b) EMPLOYEE relation with recursive foreign key

Page 34: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

34

Figure 4-18 Mapping a unary M:N relationship

(a) Bill-of-materials relationships (M:N)

(b) ITEM and COMPONENT relationsER Studio Notation with Sample data

ITEM COMPONENTItemNo ItemDescrip ItemNo ComponentNo QuantityADC8 Audio Card PCD2 ADC8 1MBD2 Motherboard PCD2 MBD2 1PCD2 PC Dual Core PCD2 RAM9 4PCQ5 PC Quad Core . . . RAM9 1GB RAM Chip PCQ5 ADC8 1. . . PCQ5 MBD2 1

PCQ5 RAM9 8. . .

Page 35: Normalization Rules for Database Tables Northern Arizona University College of Business Administration

35

Figure 4-18 Mapping a unary M:N relationship

(a) Bill-of-materials relationships (M:N)

(b) ITEM and COMPONENT relations