27
Normalization • Also called “loss-less decomposition” • Process of optimizing table structures to eliminate redundancy and avoid anomalies and problems with extensibility. • Supports the golden rule: Each fact should be stored in the database only once. • Does not provide the solution to all design problems but provides a solid foundation.

Normalization Also called “loss-less decomposition” Process of optimizing table structures to eliminate redundancy and avoid anomalies and problems with

Embed Size (px)

Citation preview

Page 1: Normalization Also called “loss-less decomposition” Process of optimizing table structures to eliminate redundancy and avoid anomalies and problems with

Normalization

• Also called “loss-less decomposition”

• Process of optimizing table structures to eliminate redundancy and avoid anomalies and problems with extensibility.

• Supports the golden rule: Each fact should be stored in the database only once.

• Does not provide the solution to all design problems but provides a solid foundation.

Page 2: Normalization Also called “loss-less decomposition” Process of optimizing table structures to eliminate redundancy and avoid anomalies and problems with

Normal Forms

• 1st Normal Form

• 2nd Normal Form

• 3rd Normal Form

• BCNF

• 4th Normal Form

• 5th Normal Form

• Domain-Key Normal Form

Page 3: Normalization Also called “loss-less decomposition” Process of optimizing table structures to eliminate redundancy and avoid anomalies and problems with

1st Normal Form

• The relation has no identifiable primary key.

• Any attempt has been made to store a multi-valued fact in a tuple.

First Normal Form is violated if:

Page 4: Normalization Also called “loss-less decomposition” Process of optimizing table structures to eliminate redundancy and avoid anomalies and problems with

1st NF - Example

• Query-ability

• Join-ability

• Constrain-ability

• Extensibility (of Language Domain)

• Extensibility (of Schema)

Evaluate the design solutions on the next four slides for:

Page 5: Normalization Also called “loss-less decomposition” Process of optimizing table structures to eliminate redundancy and avoid anomalies and problems with

1NF Example – Schema 1 (correct)

Programs TableEMPID LANGUAGE

2323

32

233132

COBOL

SQLSQL

SQLJAVA

JAVA

EMPID LNAME FNAME DEPT PHONE SALARY

23 Jones Mark ITR 555-1087 4500025 Smith Sara FINC 555-2222 5500026 Billings David ACTG 555-4356 4200031 Dance Ivanna ACTG 444-4887 6000032 Jones Mary ITR 555-8745 7000035 Barker Bob ACTG 555-6565 4400036 Woods Robin ITR 555-9812 9000037 Jones Mary FINC 555-1234 56000

Employees TableSEX

MFMFFMMF

3232

37

363636

VB

VBSQL

COBOLJAVA

COBOL

NAME

COBOL

SQLJAVA

VB

FULLNAME

COmmon Business Oriented Language

Structured Query LanguageJAVA

Visual Basic

Languages Table37 SQL

Page 6: Normalization Also called “loss-less decomposition” Process of optimizing table structures to eliminate redundancy and avoid anomalies and problems with

1NF Example – Schema 2 (incorrect)

LANGUAGES

COBOL, JAVA, SQL

SQLJAVA, SQL, VB, COBOL

EMPID LNAME FNAME DEPT PHONE SALARY

23 Jones Mark ITR 555-1087 4500025 Smith Sara FINC 555-2222 5500026 Billings David ACTG 555-4356 4200031 Dance Ivanna ACTG 444-4887 6000032 Jones Mary ITR 555-8745 7000035 Barker Bob ACTG 555-6565 4400036 Woods Robin ITR 555-9812 9000037 Jones Mary FINC 555-1234 56000

Employees TableSEX

MFMFFMMF

VB, SQL, JAVACOBOL, SQL

NAME

COBOL

SQLJAVA

VB

FULLNAME

COmmon Business Oriented Language

Structured Query LanguageJAVA

Visual Basic

Languages Table

Page 7: Normalization Also called “loss-less decomposition” Process of optimizing table structures to eliminate redundancy and avoid anomalies and problems with

1NF Example – Schema 3 (incorrect)

LANG1EMPID LNAME FNAME DEPT PHONE SALARY

23 Jones Mark ITR 555-1087 4500025 Smith Sara FINC 555-2222 5500026 Billings David ACTG 555-4356 4200031 Dance Ivanna ACTG 444-4887 6000032 Jones Mary ITR 555-8745 7000035 Barker Bob ACTG 555-6565 4400036 Woods Robin ITR 555-9812 9000037 Jones Mary FINC 555-1234 56000

Employees TableSEX

MFMFFMMF

NAME

COBOL

SQLJAVA

VB

FULLNAME

COmmon Business Oriented Language

Structured Query LanguageJAVA

Visual Basic

Languages Table

COBOL SQL

SQLSQLJAVA

JAVA

VB

VB SQLCOBOL

JAVA

COBOL

SQL

LANG2 LANG3 LANG4

Page 8: Normalization Also called “loss-less decomposition” Process of optimizing table structures to eliminate redundancy and avoid anomalies and problems with

1NF Example – Schema 4 (incorrect)

COBOLEMPID LNAME FNAME DEPT PHONE SALARY

23 Jones Mark ITR 555-1087 4500025 Smith Sara FINC 555-2222 5500026 Billings David ACTG 555-4356 4200031 Dance Ivanna ACTG 444-4887 6000032 Jones Mary ITR 555-8745 7000035 Barker Bob ACTG 555-6565 4400036 Woods Robin ITR 555-9812 9000037 Jones Mary FINC 555-1234 56000

Employees TableSEX

MFMFFMMF

NAME

COBOL

SQLJAVA

VB

FULLNAME

COmmon Business Oriented Language

Structured Query LanguageJAVA

Visual Basic

Languages Table

T T

FTT

T

T

F TT

T

T

F

JAVA SQL VB

FF F F FF F F F

F F F F

F T F

TT F

Page 9: Normalization Also called “loss-less decomposition” Process of optimizing table structures to eliminate redundancy and avoid anomalies and problems with

2nd Normal Form

• First Normal Form is violated

• If there exists a non-key field(s) which is functionally dependent on a partial key.

partial key non-key

Second Normal Form is violated if:

Page 10: Normalization Also called “loss-less decomposition” Process of optimizing table structures to eliminate redundancy and avoid anomalies and problems with

2NF Example – Raw Data

JE #1 02-JAN-2003100 Cash 310 Smith-Capital(owner investment)

20,00020,000

JE #2 03-JAN-2003100 Cash 220 Notes Payable(borrowed money)

30,00030,000

JE #3 03-JAN-2003120 Supplies 100 Cash 220 Notes Payable(purchased supplies)

5,0001,0004,000

Page 11: Normalization Also called “loss-less decomposition” Process of optimizing table structures to eliminate redundancy and avoid anomalies and problems with

2NF Example – Violation

JENO LINENO DESCRIPTION ACCTNO ACCTNAME AMOUNT

1 1 Owner investment 100 Cash 20,000

1 2 Owner investment 310 Smith-Capital (20,000)

2 1 Borrowed money 100 Cash 30,000

2 2 Borrowed money 220 Notes Payable (30,000)

3 1 Purchased Supplies 120 Supplies 5,000

3 2 Purchased Supplies 100 Cash (1,000)

3 3 Purchased Supplies 220 Notes Payable (4,000)

Transactions TableDATE

02-JAN-2003

03-JAN-2003

02-JAN-2003

03-JAN-2003

03-JAN-2003

03-JAN-2003

03-JAN-2003

Is there a non-key field which is functional dependenton a partial key?

Page 12: Normalization Also called “loss-less decomposition” Process of optimizing table structures to eliminate redundancy and avoid anomalies and problems with

2NF Example – ViolationFDs that indicate violation of 2NF

JENO LINENO DESCRIPTION ACCTNO ACCTNAME AMOUNT

1 1 Owner investment 100 Cash 20,000

1 2 Owner investment 310 Smith-Capital (20,000)

2 1 Borrowed money 100 Cash 30,000

2 2 Borrowed money 220 Notes Payable (30,000)

3 1 Purchased Supplies 120 Supplies 5,000

3 2 Purchased Supplies 100 Cash (1,000)

3 3 Purchased Supplies 220 Notes Payable (4,000)

DATE

02-JAN-2003

03-JAN-2003

02-JAN-2003

03-JAN-2003

03-JAN-2003

03-JAN-2003

03-JAN-2003

Page 13: Normalization Also called “loss-less decomposition” Process of optimizing table structures to eliminate redundancy and avoid anomalies and problems with

2NF Example – Corrected

JENO LINENO ACCTNO ACCTNAME AMOUNT

1 1 100 Cash 20,000

1 2 310 Smith-Capital (20,000)

2 1 100 Cash 30,000

2 2 220 Notes Payable (30,000)

3 1 120 Supplies 5,000

3 2 100 Cash (1,000)

3 3 220 Notes Payable (4,000)

Transactions Table

JENO DESCRIPTION

1 Owner investment

2 Borrowed money

3 Purchased Supplies

DATE

02-JAN-2003

03-JAN-2003

03-JAN-2003

Journal_Entry Table

Page 14: Normalization Also called “loss-less decomposition” Process of optimizing table structures to eliminate redundancy and avoid anomalies and problems with

3rd Normal Form

• Second Normal Form is violated

• If there exists a non-key field(s) which is functionally dependent on another non-key field(s).

non-key non-key

Third Normal Form is violated if:

Note: A candidate key is not a non-key field.

Page 15: Normalization Also called “loss-less decomposition” Process of optimizing table structures to eliminate redundancy and avoid anomalies and problems with

3NF Example – Violation

JENO LINENO ACCTNO ACCTNAME AMOUNT

1 1 100 Cash 20,000

1 2 310 Smith-Capital (20,000)

2 1 100 Cash 30,000

2 2 220 Notes Payable (30,000)

3 1 120 Supplies 5,000

3 2 100 Cash (1,000)

3 3 220 Notes Payable (4,000)

Transactions Table

JENO DESCRIPTION

1 Owner investment

2 Borrowed money

3 Purchased Supplies

DATE

02-JAN-2003

03-JAN-2003

03-JAN-2003

Journal_Entry TableAre there any non-key fields which functional determine another non-key field?

Are there any redundant facts?

Page 16: Normalization Also called “loss-less decomposition” Process of optimizing table structures to eliminate redundancy and avoid anomalies and problems with

3NF Example – ViolationFD that indicates violation of 3NF

JENO LINENO ACCTNO ACCTNAME AMOUNT

1 1 100 Cash 20,000

1 2 310 Smith-Capital (20,000)

2 1 100 Cash 30,000

2 2 220 Notes Payable (30,000)

3 1 120 Supplies 5,000

3 2 100 Cash (1,000)

3 3 220 Notes Payable (4,000)

JENO DESCRIPTION

1 Owner investment

2 Borrowed money

3 Purchased Supplies

DATE

02-JAN-2003

03-JAN-2003

03-JAN-2003

Journal_Entry TableAnomalies if not corrected:

• update (if name of account 100 changes it must be changed in multiple places risking inconsistancy) • deletion (can't delete JE#3 and its transactions without losing information about account 120)• insertion (can't set up a new account, Jones-capital, for a new partner unless we first have a transaction involving that account.

Page 17: Normalization Also called “loss-less decomposition” Process of optimizing table structures to eliminate redundancy and avoid anomalies and problems with

3NF Example – Corrected

JENO LINENO ACCTNO AMOUNT

1 1 100 20,000

1 2 310 (20,000)

2 1 100 30,000

2 2 220 (30,000)

3 1 120 5,000

3 2 100 (1,000)

3 3 220 (4,000)

JENO DESCRIPTION

1 Owner investment

2 Borrowed money

3 Purchased Supplies

DATE

02-JAN-2003

03-JAN-2003

03-JAN-2003

Journal_Entry Table

Transactions Table

ACCTNO ACCTNAME

100 Cash

310 Smith-Capital

220 Notes Payable

120 Supplies

Accounts Table

Page 18: Normalization Also called “loss-less decomposition” Process of optimizing table structures to eliminate redundancy and avoid anomalies and problems with

3NF Example – CorrectedFinal Dependencies

JENO LINENO ACCTNO AMOUNT

1 1 100 20,000

1 2 310 (20,000)

2 1 100 30,000

2 2 220 (30,000)

3 1 120 5,000

3 2 100 (1,000)

3 3 220 (4,000)

JENO DESCRIPTION

1 Owner investment

2 Borrowed money

3 Purchased Supplies

DATE

02-JAN-2003

03-JAN-2003

03-JAN-2003

ACCTNO ACCTNAME

100 Cash

310 Smith-Capital

220 Notes Payable

120 Supplies

All non-key fieldsare FD on the PKand only the PK.

Page 19: Normalization Also called “loss-less decomposition” Process of optimizing table structures to eliminate redundancy and avoid anomalies and problems with

BCNF Normal Form

• Third Normal Form is violated

• If there exists a partial key which is functionally dependent on a non-key field(s).

non-key partial-key

Boyce-Codd Normal Form is violated if:

Page 20: Normalization Also called “loss-less decomposition” Process of optimizing table structures to eliminate redundancy and avoid anomalies and problems with

BCNF ExampleSemantics

• A student can have more than one major

• A student has a different advisor for each major.

• Each advisor advises for only one major.

Page 21: Normalization Also called “loss-less decomposition” Process of optimizing table structures to eliminate redundancy and avoid anomalies and problems with

BCNF Example – Violation

SID MAJOR ADVISOR

1 PHYSICS EINSTEIN

1 BIOLOGY LIVINGSTON

2 PHYSICS BOHR

2 COMPUTER SCIENCE CODD

3 PHYSICS EINSTEIN

4 BIOLOGY LIVINGSTON

4 ACCOUNTING PACIOLI

5 PHYSICS EINSTEIN

6 PHYSICS BOHR

6 BIOLOGY DARWIN

7 COMPUTER SCIENCE CODD

7 BIOLOGY DARWIN

Student_Majors Table

Does this relation violate third normal form?Are there any redundant facts?

Page 22: Normalization Also called “loss-less decomposition” Process of optimizing table structures to eliminate redundancy and avoid anomalies and problems with

BCNF Example – ViolationFD that violates BCNF

SID MAJOR ADVISOR

1 PHYSICS EINSTEIN

1 BIOLOGY LIVINGSTON

2 PHYSICS BOHR

2 COMPUTER SCIENCE CODD

3 PHYSICS EINSTEIN

4 BIOLOGY LIVINGSTON

4 ACCOUNTING PACIOLI

5 PHYSICS EINSTEIN

6 PHYSICS BOHR

6 BIOLOGY DARWIN

7 COMPUTER SCIENCE CODD

7 BIOLOGY DARWIN

It is importantthat you convinceyourself that majordoes not FDadvisor.

Page 23: Normalization Also called “loss-less decomposition” Process of optimizing table structures to eliminate redundancy and avoid anomalies and problems with

BCNF Example – Corrected

SID ADVISOR

1 EINSTEIN

1 LIVINGSTON

2 BOHR

2 CODD

3 EINSTEIN

4 LIVINGSTON

4 PACIOLI

5 EINSTEIN

6 BOHR

6 DARWIN

7 CODD

7 DARWIN

MAJORADVISOR

PHYSICSEINSTEIN

BIOLOGYLIVINGSTON

PHYSICSBOHR

COMPUTER SCIENCECODD

ACCOUNTINGPACIOLI

BIOLOGYDARWIN

Student_Advisors Table

Advisors Table

Note that the if the original key, counter-intuitively, in schema 1had been defined as SID & ADVISORthis would have been a 2NF violation.

Page 24: Normalization Also called “loss-less decomposition” Process of optimizing table structures to eliminate redundancy and avoid anomalies and problems with

4th Normal Form

• Boyce Codd Normal Form is violated• If there exists a partial key which has

multiple independent multi-valued functional dependencies to other partial keys.

partial-key1 partial-key2 partial-key3

4th Normal Form is violated if:

Page 25: Normalization Also called “loss-less decomposition” Process of optimizing table structures to eliminate redundancy and avoid anomalies and problems with

4NF Example – Violation

Name Language

Fred French

Fred Italian

Fred Spanish

Instrument

Piano

Flute

Flute

Instruments_Languages

Jane French

Jane French

Piano

Oboe

Sam French

Sam Spanish

Sam Spanish

Piano

Oboe

Flute

Page 26: Normalization Also called “loss-less decomposition” Process of optimizing table structures to eliminate redundancy and avoid anomalies and problems with

4NF Example – Violation

Name Language

Fred French

Fred Italian

Fred Spanish

Instrument

Piano

Flute

Flute

Jane French

Jane French

Piano

Oboe

Sam French

Sam Spanish

Sam Spanish

Piano

Oboe

Flute

Does this relation violate 1st, 2nd, 3rd, or BCNF?Are there any redundant facts?

Page 27: Normalization Also called “loss-less decomposition” Process of optimizing table structures to eliminate redundancy and avoid anomalies and problems with

4NF Example – Correction

Name Language

Fred French

Fred Italian

Fred Spanish

LanguagesSpoken

Jane French

Sam French

Sam Spanish

Name

Fred

Fred

Instrument

Piano

Flute

InstrumentsPlayed

Jane

Jane

Piano

Oboe

Sam

Sam

Sam

Piano

Oboe

Flute