49
DATA NORMALIZATION CS 260 Database Systems

DATA NORMALIZATION CS 260 Database Systems. Overview Introduction Anomalies Functional dependence Normal forms 1NF 2NF 3NF BCNF Denormalization

Embed Size (px)

Citation preview

Page 1: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

DATA NORMALIZATIONCS 260

Database Systems

Page 2: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Overview

Introduction Anomalies Functional dependence Normal forms

1NF 2NF 3NF BCNF

Denormalization

Page 3: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Introduction

Database normalization is a process used to generate a schema that is without unnecessary redundancy while allowing information to be retrieved easily It consists primarily of breaking tables into

smaller tables to remove redundant data that can lead to anomalies

The results of the normalization process allow schemas to be described as adhering to a particular “normal form”

Page 4: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Introduction

Normalization requires domain specific knowledge in order to identify “functional dependencies” Some of this may be expressed in an ER

model, but not always Normalization is particularly useful for

addressing and fixing an existing (and possibly poorly designed) database schema

Normalization allows for a design that is free of insertion, update, and deletion anomalies

Page 5: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Overview

Introduction Anomalies Functional dependence Normal forms

1NF 2NF 3NF BCNF

Denormalization

Page 6: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Anomalies

Insertion anomaly Occurs when inserting a new record causes

data to become inconsistent In the following example, an insertion anomaly

occurred when Franklin T. Wong’s employee record was first inserted

His department manager’s SSN was entered incorrectly

Occurs also when a new record cannot be inserted due to missing data In the following example, an insertion anomaly

would occur if an attempt was made to insert a record for a new project in the EMPLOYEE-PROJECTS table

Cannot be inserted if it doesn’t have any associated employees

Page 7: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

AnomaliesEMPLOYEE (SSN, Ename, Bdate, Address, Dnumber, Dname, DMgrSSN) SSN EName Bdate Address Dnumber DName DMgrSSN 123456789 Smith, John B. 09-Jan-55 731 Fondren, Houston,

TX 5 Research 333445555

333445555 Wong, Franklin T. 08-Dec-45 638 Voss, Houston, TX 5 Research 333444555 999887777 Zelaya, Alicia J. 19-Jul-61 3321 Castle, Spring, TX 4 Administration 987654321 987654321 Wallace, Jennifer S. 19-Jun-31 291 Berry, Bellaire, TX 4 Administration 987654321 666884444 Narayan Ramesh K.. 15-Sep-52 975 FireOak, Humble,

TX 5 Research 333445555

453453453 English, Joyce A. 31-Jun-62 5631 Rice, Houston, TX 5 Research 333445555 987987987 Jabbar, Ahmad V. 29-Mar-59 980 Dallas, Houston, TX 4 Administration 987654321 888665555 Borg, James E. 10-Nov-27 450 Stone, Houston, TX 1 Headquarters 888665555

EMPLOYEE-PROJECTS (SSN, ProjNumber, Hours, ProjName, ProjLoc) SSN EName ProjNumber Hours ProjName ProjLocation 123456789 Smith, John B. 1 32.5 ProductX Stafford 123456789 Smith, John B. 2 7.5 ProductY Sugarland 666884444 Narayan, Ramesh K. 3 40.0 ProductZ Houston 453453453 English, Joyce A. 1 20.0 ProductX Bellaire 453453453 Meadows, Joyce A. 2 20.0 ProductY Sugarland 333445555 Wong, Frank 2 10.0 ProductY Bellaire 333445555 Wong, Frank 3 10.0 ProductZ Houston 333445555 Wong, Franklin T. 10 10.0 Computerization Stafford 333445555 Wong, Franklin T. 20 10.0 Reorganization Houston

Page 8: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Anomalies

Update anomaly Occurs when some but not all instances of

a data value are updated In the following example, an update anomaly

occurred if an attempt to update Joyce English’s records was made to accommodate her last name change

This was updated in the EMPLOYEE table and in one record in the EMPLOYEE-PROJECTS table, but not in the second record in the EMPLOYEE-PROJECTS table

An update anomaly may also have occurred if an attempt was made to change Project X’s location to “Bellaire”

Project 1’s related data was missed in the update

Page 9: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

AnomaliesEMPLOYEE (SSN, Ename, Bdate, Address, Dnumber, Dname, DMgrSSN) SSN EName Bdate Address Dnumber DName DMgrSSN 123456789 Smith, John B. 09-Jan-55 731 Fondren, Houston,

TX 5 Research 333445555

333445555 Wong, Franklin T. 08-Dec-45 638 Voss, Houston, TX 5 Research 333444555 999887777 Zelaya, Alicia J. 19-Jul-61 3321 Castle, Spring, TX 4 Administration 987654321 987654321 Wallace, Jennifer S. 19-Jun-31 291 Berry, Bellaire, TX 4 Administration 987654321 666884444 Narayan Ramesh K.. 15-Sep-52 975 FireOak, Humble,

TX 5 Research 333445555

453453453 English, Joyce A. 31-Jun-62 5631 Rice, Houston, TX 5 Research 333445555 987987987 Jabbar, Ahmad V. 29-Mar-59 980 Dallas, Houston, TX 4 Administration 987654321 888665555 Borg, James E. 10-Nov-27 450 Stone, Houston, TX 1 Headquarters 888665555

EMPLOYEE-PROJECTS (SSN, ProjNumber, Hours, ProjName, ProjLoc) SSN EName ProjNumber Hours ProjName ProjLocation 123456789 Smith, John B. 1 32.5 ProductX Stafford 123456789 Smith, John B. 2 7.5 ProductY Sugarland 666884444 Narayan, Ramesh K. 3 40.0 ProductZ Houston 453453453 English, Joyce A. 1 20.0 ProductX Bellaire 453453453 Meadows, Joyce A. 2 20.0 ProductY Sugarland 333445555 Wong, Frank 2 10.0 ProductY Bellaire 333445555 Wong, Frank 3 10.0 ProductZ Houston 333445555 Wong, Franklin T. 10 10.0 Computerization Stafford 333445555 Wong, Franklin T. 20 10.0 Reorganization Houston

Page 10: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Anomalies

Deletion anomaly Occurs when a record is deleted to remove

some data instance, but other data was inadvertently deleted as well In the following example, a deletion anomaly

would occur if Franklin T. Wong’s records are removed from the EMPLOYEE and EMPLOYEE-PROJECTS tables

Now data regarding the “Computerization” and “Reorganization” projects are gone

Page 11: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

AnomaliesEMPLOYEE (SSN, Ename, Bdate, Address, Dnumber, Dname, DMgrSSN) SSN EName Bdate Address Dnumber DName DMgrSSN 123456789 Smith, John B. 09-Jan-55 731 Fondren, Houston,

TX 5 Research 333445555

333445555 Wong, Franklin T. 08-Dec-45 638 Voss, Houston, TX 5 Research 333444555 999887777 Zelaya, Alicia J. 19-Jul-61 3321 Castle, Spring, TX 4 Administration 987654321 987654321 Wallace, Jennifer S. 19-Jun-31 291 Berry, Bellaire, TX 4 Administration 987654321 666884444 Narayan Ramesh K.. 15-Sep-52 975 FireOak, Humble,

TX 5 Research 333445555

453453453 English, Joyce A. 31-Jun-62 5631 Rice, Houston, TX 5 Research 333445555 987987987 Jabbar, Ahmad V. 29-Mar-59 980 Dallas, Houston, TX 4 Administration 987654321 888665555 Borg, James E. 10-Nov-27 450 Stone, Houston, TX 1 Headquarters 888665555

EMPLOYEE-PROJECTS (SSN, ProjNumber, Hours, ProjName, ProjLoc) SSN EName ProjNumber Hours ProjName ProjLocation 123456789 Smith, John B. 1 32.5 ProductX Stafford 123456789 Smith, John B. 2 7.5 ProductY Sugarland 666884444 Narayan, Ramesh K. 3 40.0 ProductZ Houston 453453453 English, Joyce A. 1 20.0 ProductX Bellaire 453453453 Meadows, Joyce A. 2 20.0 ProductY Sugarland 333445555 Wong, Frank 2 10.0 ProductY Bellaire 333445555 Wong, Frank 3 10.0 ProductZ Houston 333445555 Wong, Franklin T. 10 10.0 Computerization Stafford 333445555 Wong, Franklin T. 20 10.0 Reorganization Houston

Page 12: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Overview

Introduction Anomalies Functional dependence Normal forms

1NF 2NF 3NF BCNF

Denormalization

Page 13: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Functional Dependence

Functional dependence Functional dependence occurs when the

values of one or more attributes (A) in some entity unambiguously determine the values of one or more other attributes (B) Notation: A B In other words, if we know the values of all

attributes in set A, then we can uniquely identify the values of all attributes in set B

These sets can consist of one or more attributes

Page 14: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Functional Dependence

Strategies for determining functional dependence For each field, ask if its value can be determined if

the values of one or more other fields are known Is the field dependent on one or more other fields

For each field, ask if its value is known, can the values of any other fields be identified Is the field a determinant for one or more other fields

Group functional dependencies with the same determinant into a single relation

The following types of functional dependencies can be ignored

{A, B} A {A, B} B {A, B} {A, B}

Page 15: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Functional Dependence

Super Keys A super key is a set of attributes (possibly

consisting of a single attribute) that uniquely identifies a record

Example In the CANDY_CUSTOMER table in our candy

database, {cust_id} is a super key {cust_id, cust_name} is also a super key {cust_id, cust_type} is also a super key Any combination of attributes in

CANDY_CUSTOMER that includes a super key is also a super key

Page 16: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Sample Database (CANDY)CUST_ID CUST_NAME CUST_TYPE CUST_ADDR CUST_ZIP CUST_PHONE USERNAME PASSWORD

1 Jones, Joe P 1234 Main St. 91212 434-1231 jonesj 12342 Armstrong,Inc. R 231 Globe Blvd. 91212 434-7664 armstrong 33333 Sw edish Burgers R 1889 20th N.E. 91213 434-9090 sw edburg 23534 Pickled Pickles R 194 CityView 91289 324-8909 pickpick 53335 The Candy Kid W 2121 Main St. 91212 563-4545 kidcandy 23516 Waterman, Al P 23 Yankee Blvd. 91234 w ateral 89007 Bobby Bon Bons R 12 Nichi Cres. 91212 434-9045 bobbybon 30118 Crow sh, Elias P 7 77th Ave. 91211 434-0007 crow el 10339 Montag, Susie P 981 Montview 91213 456-2091 montags 9633

10 Columberg Sw eets W 239 East Falls 91209 874-9092 columsw e 8399

PURCH_ID PROD_ID CUST_ID PURCH_DATE DELIVERY_DATE POUNDS STATUS

1 1 5 28-Oct-04 28-Oct-04 3.5 PAID2 2 6 28-Oct-04 30-Oct-04 15 PAID3 1 9 28-Oct-04 28-Oct-04 2 PAID3 3 9 28-Oct-04 28-Oct-04 3.7 PAID4 3 2 28-Oct-04 3.7 PAID5 1 7 29-Oct-04 29-Oct-04 3.7 NOT PAID5 2 7 29-Oct-04 29-Oct-04 1.2 NOT PAID5 3 7 29-Oct-04 29-Oct-04 4.4 NOT PAID6 2 7 29-Oct-04 3 PAID7 2 10 29-Oct-04 14 NOT PAID7 5 10 29-Oct-04 4.8 NOT PAID8 1 4 29-Oct-04 29-Oct-04 1 PAID8 5 4 29-Oct-04 7.6 PAID9 5 4 29-Oct-04 29-Oct-04 3.5 NOT PAID

PROD_ID PROD_DESC PROD_COSTPROD_PRICE

1 Celestial Cashew Crunch 7.45$ 10.00$

2 Unbrittle Peanut Paradise 5.75$ 9.00$

3 Mystery Melange 7.75$ 10.50$

4 Millionaire’s Macadamia Mix 12.50$ 16.00$

5 Nuts Not Nachos 6.25$ 9.50$

CUST_TYPE_IDCUST_TYPE_DESC

P Private

R Retail

W Wholesale

CANDY_CUSTOMER

CANDY_PURCHASE

CANDY_CUST_TYPE

CANDY_PRODUCT

Page 17: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Functional Dependence

Candidate Keys A candidate key is a super key with a minimal

set of attributes Unlike a primary key, a table can potentially have

more than one candidate key So a primary key is a candidate key, but a candidate

key is not necessarily a primary key

Example In the CANDY_CUSTOMER table in our candy

database, {cust_id} is a candidate key The addition of any other attributes in this set would be

a super key, but not a candidate key If usernames also must be unique, then

{username} is also a candidate key

Page 18: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Sample Database (CANDY)CUST_ID CUST_NAME CUST_TYPE CUST_ADDR CUST_ZIP CUST_PHONE USERNAME PASSWORD

1 Jones, Joe P 1234 Main St. 91212 434-1231 jonesj 12342 Armstrong,Inc. R 231 Globe Blvd. 91212 434-7664 armstrong 33333 Sw edish Burgers R 1889 20th N.E. 91213 434-9090 sw edburg 23534 Pickled Pickles R 194 CityView 91289 324-8909 pickpick 53335 The Candy Kid W 2121 Main St. 91212 563-4545 kidcandy 23516 Waterman, Al P 23 Yankee Blvd. 91234 w ateral 89007 Bobby Bon Bons R 12 Nichi Cres. 91212 434-9045 bobbybon 30118 Crow sh, Elias P 7 77th Ave. 91211 434-0007 crow el 10339 Montag, Susie P 981 Montview 91213 456-2091 montags 9633

10 Columberg Sw eets W 239 East Falls 91209 874-9092 columsw e 8399

PURCH_ID PROD_ID CUST_ID PURCH_DATE DELIVERY_DATE POUNDS STATUS

1 1 5 28-Oct-04 28-Oct-04 3.5 PAID2 2 6 28-Oct-04 30-Oct-04 15 PAID3 1 9 28-Oct-04 28-Oct-04 2 PAID3 3 9 28-Oct-04 28-Oct-04 3.7 PAID4 3 2 28-Oct-04 3.7 PAID5 1 7 29-Oct-04 29-Oct-04 3.7 NOT PAID5 2 7 29-Oct-04 29-Oct-04 1.2 NOT PAID5 3 7 29-Oct-04 29-Oct-04 4.4 NOT PAID6 2 7 29-Oct-04 3 PAID7 2 10 29-Oct-04 14 NOT PAID7 5 10 29-Oct-04 4.8 NOT PAID8 1 4 29-Oct-04 29-Oct-04 1 PAID8 5 4 29-Oct-04 7.6 PAID9 5 4 29-Oct-04 29-Oct-04 3.5 NOT PAID

PROD_ID PROD_DESC PROD_COSTPROD_PRICE

1 Celestial Cashew Crunch 7.45$ 10.00$

2 Unbrittle Peanut Paradise 5.75$ 9.00$

3 Mystery Melange 7.75$ 10.50$

4 Millionaire’s Macadamia Mix 12.50$ 16.00$

5 Nuts Not Nachos 6.25$ 9.50$

CUST_TYPE_IDCUST_TYPE_DESC

P Private

R Retail

W Wholesale

CANDY_CUSTOMER

CANDY_PURCHASE

CANDY_CUST_TYPE

CANDY_PRODUCT

Page 19: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Functional Dependence

Functional dependence is often illustrated for a table using a dependency diagram This diagram identifies the fields whose

values determine the values of other fields Arrows are drawn from the “determinant”

fields to the “dependent” fields We’ll see examples of these as we

discuss normal forms in more detail

Page 20: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Overview

Introduction Anomalies Functional dependence Normal forms

1NF 2NF 3NF BCNF

Denormalization

Page 21: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

First Normal Form

First normal form (1NF) A database table is in 1NF if it meets the

following requirements: It does not contain any multivalued attributes It does not contain any inappropriately complex

attributes Solution for converting a database table to

1NF Replace multivalued attributes with multiple

records Replace complex attributes with atomic

attributes

Page 22: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

First Normal Form

1NF example Non-1NF Table

Corresponding 1NF Table

This assumes that an EmpName will never need to be searched, sorted, or formatted according to first/last names

Proj# ProjName Emp# EmpName JobType ChgPerHour Hours 1 Satellite 101, 102,

104 News, John; Senior, David; Ramoras, Ann

Elect Eng, Comm Tech, Comm Tech

65, 60, 60 13, 16, 19

Proj# ProjName Emp# EmpName JobType ChgPerHour Hours 1 Satellite 101 News, John Elect Eng 65 13 1 Satellite 102 Senior, David Comm Tech 60 16 1 Satellite 104 Ramoras, Ann Comm Tech 60 19

Page 23: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

First Normal Form

Functional dependencies of 1NF table

{Proj#} {ProjName} {Proj#, Emp#} {ProjName, EmpName, JobType,

ChgPerHour, Hours} {Emp#} {EmpName, JobType, ChgPerHour} {JobType} {ChgPerHour}

Proj# ProjName Emp# EmpName JobType ChgPerHour Hours 1 Satellite 101 News, John Elect Eng 65 13 1 Satellite 102 Senior, David Comm Tech 60 16 1 Satellite 104 Ramoras, Ann Comm Tech 60 19 2 WAN 101 News, John Elect Eng 65 17 2 WAN 104 Ramoras, Ann Comm Tech 60 25

Determinants Dependents

Candidate key(s): {Proj#, Emp#}

Page 24: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

First Normal Form

Corresponding dependency diagram

Proj# ProjName Emp# EmpName JobType ChgPerHour Hours 1 Satellite 101 News, John Elect Eng 65 13 1 Satellite 102 Senior, David Comm Tech 60 16 1 Satellite 104 Ramoras, Ann Comm Tech 60 19 2 WAN 101 News, John Elect Eng 65 17 2 WAN 104 Ramoras, Ann Comm Tech 60 25

Page 25: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Overview

Introduction Anomalies Functional dependence Normal forms

1NF 2NF 3NF BCNF

Denormalization

Page 26: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Second Normal Form

Second normal form (2NF) A database table is in 2NF if it meets the

following requirements: It is in 1NF, and All non-prime attributes depend on all attributes

of each candidate key (no “partial dependencies”) A “non-prime” attribute is one that does not

belong to any candidate key of the table As a result, if all of a table’s candidate keys

consist of only single attributes, and it is already in 1NF, then it is already in 2NF

Page 27: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Second Normal Form

Non-2NF table (previously seen table in 1NF)Proj# ProjName Emp# EmpName JobType ChgPerHour Hours

1 Satellite 101 News, John Elect Eng 65 13 1 Satellite 102 Senior, David Comm Tech 60 16 1 Satellite 104 Ramoras, Ann Comm Tech 60 19 2 WAN 101 News, John Elect Eng 65 17 2 WAN 104 Ramoras, Ann Comm Tech 60 25

Non-prime attributes dependent on only a part of the lone candidate key

Page 28: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Second Normal Form

Solution for converting a table to 2NF Convert it to 1NF (if it’s not already in 1NF) Create a table for each of the functional dependencies

that involved only a part of the candidate key Those candidate key components should now be candidate

keys in their new tables If a M:M relationship exists between the entities that

are now in separate tables or the relationship has attributes Create a linking table containing each of those candidate

key components, as well as any attributes that were originally dependent on the entire candidate key

Otherwise, add a foreign key appropriate for the relationship type

Page 29: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Second Normal Form

Corresponding 2NF tablesProj# Emp# Hours 1 101 13 1 102 16 1 104 19 2 101 17 2 104 25

Emp# EmpName JobType ChgPerHour 101 News, John Elect Eng 65 102 Senior, David Comm Tech 60 104 Ramoras, Ann Comm Tech 60

Proj# ProjName 1 Satellite 2 WAN

Page 30: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Overview

Introduction Anomalies Functional dependence Normal forms

1NF 2NF 3NF BCNF

Denormalization

Page 31: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Third Normal Form

Third normal form (3NF) A database table is in 3NF if it meets the

following requirements: It is in 2NF, and All non-prime attributes are dependent only on

every candidate key in the table (no “transitive dependencies”)

Solution for converting a table to 3NF Convert it to 2NF (if it isn’t already in 2NF) Create a table for each of the offending

functional dependencies and join appropriately to the original table

Page 32: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Third Normal Form

Non-3NF table (previously seen tables in 2NF)Proj# Emp# Hours 1 101 13 1 102 16 1 104 19 2 101 17 2 104 25

Emp# EmpName JobType ChgPerHour 101 News, John Elect Eng 65 102 Senior, David Comm Tech 60 104 Ramoras, Ann Comm Tech 60

Proj# ProjName 1 Satellite 2 WAN

Violates 3NFTransitive dependencyEmp# -> JobType -> ChgPerHourNon-prime ChgPerHour depends on the non-prime JobType attribute

Non-prime attributes: EmpName JobType ChgPerHour

Page 33: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Third Normal Form

Corresponding 3NF tablesProj# Emp# Hours 1 101 13 1 102 16 1 104 19 2 101 17 2 104 25

Emp# EmpName JobType 101 News, John Elect Eng 102 Senior, David Comm Tech 104 Ramoras, Ann Comm Tech

Proj# ProjName 1 Satellite 2 WAN

JobType ChgPerHour Elect Eng 65 Comm Tech 60

Page 34: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Overview

Introduction Anomalies Functional dependence Normal forms

1NF 2NF 3NF BCNF

Denormalization

Page 35: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Boyce Codd Normal Form (BCNF) Boyce Codd normal form (BCNF)

A database table is in BCNF if it meets the following requirements: It is in 3NF, and Every determinant in the table is a candidate key

It’s uncommon for a table to be in 3NF but not BCNF 3NF adds restrictions between non-prime attributes and

candidate keys while BCNF adds restrictions between candidate key components

For a table to be in 3NF but not BCNF, it must contain two or more overlapping composite candidate keys

A component of one of these candidate keys must determine a component of another candidate key to be in 3NF but not BCNF

Page 36: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Boyce Codd Normal Form (BCNF) Non-BCNF table (3NF)

Court Time Slot Rate Type

1 1 SAVER

1 3 SAVER

1 4 STANDARD

2 1 PREMIUM-B

2 4 PREMIUM-B

2 5 PREMIUM-A

TENNIS COURT BOOKING

Candidate Keys{Court, Time Slot}{Rate Type, Time Slot}

Offending Functional Dependency{Rate Type} -> {Court}

SAVER and STANDARD rate types apply to court 1 while PREMIUM-A and PREMIUM-B rate types apply to court 2

Page 37: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Boyce Codd Normal Form (BCNF) Solution for converting a table to BCNF

Convert it to 3NF (if it isn’t already in 3NF) Create a separate table for the offending

functional dependency and join appropriately to the original table

Rate Type

Time Slot

SAVER 1

SAVER 3

STANDARD 4

PREMIUM-B

1

PREMIUM-B

4

PREMIUM-A

5

TENNIS COURT BOOKING

Court

Rate Type

1 SAVER

1 STANDARD

2 PREMIUM-A

2 PREMIUM-B

TENNIS COURT RATES

Page 38: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Normal Forms

Other normal forms exist as well 4NF 5NF 6NF

These either rarely occur or are more theoretical 1NF through BCNF are adequate for practical use Table derivations from ER diagrams usually

result in a 3NF design Dependency diagrams can be used for revisions

and verification

Page 39: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Overview

Introduction Anomalies Functional dependence Normal forms

1NF 2NF 3NF BCNF

Denormalization

Page 40: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Denormalization

Normalization results in more tables for the same data which increases processing complexity Complex joins require more processing Increased disk I/O for select, insert, update, and

delete operations as well If performance is significantly impacted, a

database schema may need to be “denormalized” Tables may be combined into fewer tables or views

may be created to prevent the need for joins and multiple table inserts, updates, and deletes

Page 41: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Class Exercise

Convert the following table into BCNF

INVOICE

Invoice Customer

Name Address Part Price Quantity

1001 43 Jim Jones 12 Main St.

Screw, Nut, Washer

0.10, 0.05, 0.05

200, 300, 100

1002 55 John Smith

13 Main St.

Screw, Brace

0.10, 5.00

100, 1

1003 43 Jim Jones 12 Main St.

Saw 12.00 10

Page 42: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Class Exercise

Is the table in 1NF? Suppose customer names may be searched

and sorted according to first and last names

It contains multivalued attributes as well as inappropriately complex attributes, so it is not in 1NF

Solution for converting a database table to 1NF Replace multivalued attributes with multiple

records Replace complex attributes with atomic

attributes It may make sense to add fields to the table’s

primary key

Page 43: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Class Exercise

INVOICE (old)

INVOICE (new)

Invoice

Customer

FName

LName Address Part Price Quantity

1001 43 Jim Jones 12 Main St.

Screw 0.10 200

1001 43 Jim Jones 12 Main St.

Nut 0.05 300

1001 43 Jim Jones 12 Main St.

Washer 0.05 100

1002 55 John Smith 13 Main St.

Screw 0.10 100

1002 55 John Smith 13 Main St.

Brace 5.00 1

1003 43 Jim Jones 12 Main St.

Saw 12.00 10

Invoice Customer

Name Address Part Price Quantity

1001 43 Jim Jones 12 Main St.

Screw, Nut, Washer

0.10, 0.05, 0.05

200, 300, 100

1002 55 John Smith

13 Main St.

Screw, Brace

0.10, 5.00

100, 1

1003 43 Jim Jones 12 Main St.

Saw 12.00 10

Page 44: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Class Exercise

Is the table in 2NF? Identify the table’s functional dependencies There are non-prime attributes that depend on components

of candidate keys, so it is not in 2NF {Invoice} -> {Customer, FName, LName} {Part} -> {Price} INVOICE

Invoice Customer

FName

LName Address Part Price Quantity

1001 43 Jim Jones 12 Main St. Screw 0.10 200

1001 43 Jim Jones 12 Main St. Nut 0.05 300

1001 43 Jim Jones 12 Main St. Washer 0.05 100

1002 55 John Smith 13 Main St. Motor 52.00 1

1002 55 John Smith 13 Main St. Brace 5.00 1

1003 43 Jim Jones 12 Main St. Saw 12.00 10

Page 45: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Class Exercise

Solution for converting a table to 2NF Convert it to 1NF (if it’s not already in 1NF) Create a table for each of the functional

dependencies that involved only a part of the candidate key Those candidate key components should now

be candidate keys in their new tables Create a linking table containing each of

those candidate key components, as well as any attributes that were originally dependent on the entire candidate key

Page 46: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Class Exercise

INVOICE (old)Invoice Custom

erFNam

eLName Address Part Price Quantit

y

1001 43 Jim Jones 12 Main St. Screw 0.10 200

1001 43 Jim Jones 12 Main St. Nut 0.05 300

1001 43 Jim Jones 12 Main St. Washer 0.05 100

1002 55 John Smith 13 Main St. Screw 0.10 100

1002 55 John Smith 13 Main St. Brace 5.00 1

1003 43 Jim Jones 12 Main St. Saw 12.00 10INVOICE (new)

Part Price

Screw 0.10

Nut 0.05

Washer

0.05

Brace 5.00

Saw 12.00

PART (new)Invoic

ePart Quant

ity

1001 Screw 200

1001 Nut 300

1001 Washer 100

1002 Screw 100

1002 Brace 1

1003 Saw 10

INVOICE-PART (new)Invoic

eCustomer

FName

LName Address

1001 43 Jim Jones 12 Main St.

1002 55 John Smith 13 Main St.

1003 43 Jim Jones 12 Main St.

Page 47: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Class Exercise

Are the tables in 3NF? There are non-prime attributes (FName, LName, Address) that

depend only on something other than all candidate keys, so it is not in 3NF

Solution for converting a table to 3NF Convert it to 2NF (if it isn’t already in 2NF) Create a table for each of the offending functional dependencies and join

appropriately to the original table

INVOICE (new)Part Price

Screw 0.10

Nut 0.05

Washer

0.05

Brace 5.00

Saw 12.00

PART (new)Invoic

ePart Quant

ity

1001 Screw 200

1001 Nut 300

1001 Washer 100

1002 Screw 100

1002 Brace 1

1003 Saw 10

INVOICE-PART (new)Invoic

eCustomer

FName

LName Address

1001 43 Jim Jones 12 Main St.

1002 55 John Smith 13 Main St.

1003 43 Jim Jones 12 Main St.

Page 48: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Class Exercise

INVOICE (old)Part Price

Screw 0.10

Nut 0.05

Washer

0.05

Brace 5.00

Saw 12.00

PART (okay)Invoic

ePart Quant

ity

1001 Screw 200

1001 Nut 300

1001 Washer 100

1002 Screw 100

1002 Brace 1

1003 Saw 10

INVOICE-PART (okay)

Customer

FName

LName Address

43 Jim Jones 12 Main St.

55 John Smith 13 Main St.

CUSTOMER (new)Invoice Custo

mer

1001 43

1002 55

1003 43

INVOICE (new)

Invoice

Customer

FName

LName Address

1001 43 Jim Jones 12 Main St.

1002 55 John Smith 13 Main St.

1003 43 Jim Jones 12 Main St.

3NF Conversion

Page 49: DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

Class Exercise

Are the tables in BCNF? Every determinant in all tables is a

candidate key, so they are in BCNF

Part Price

Screw 0.10

Nut 0.05

Washer

0.05

Brace 5.00

Saw 12.00

PARTInvoic

ePart Quant

ity

1001 Screw 200

1001 Nut 300

1001 Washer 100

1002 Screw 100

1002 Brace 1

1003 Saw 10

INVOICE-PART (okay)Customer

FName

LName Address

43 Jim Jones 12 Main St.

55 John Smith 13 Main St.

CUSTOMER

Invoice Customer

1001 43

1002 55

1003 43

INVOICE