29
Normalization 6. CSE2132 Database Systems Week 6 Lecture Normalization

Normalization 6. 1 CSE2132 Database Systems Week 6 Lecture Normalization

Embed Size (px)

Citation preview

Page 1: Normalization 6. 1 CSE2132 Database Systems Week 6 Lecture Normalization

Normalization 6. 1

CSE2132 Database Systems

Week 6 Lecture

Normalization

Page 2: Normalization 6. 1 CSE2132 Database Systems Week 6 Lecture Normalization

Normalization 6. 2

Week 5 lecture review: Logical Database Design

Steps

1. Conceptual Model (ER Diagram) mapped onto a logical model dependent on the DBMS characteristics. 2. De-normalization (Optimize for efficiency).

•Combining tables to avoid doing joins•Create more tables - Horizontal and Vertical partitioning•Data replication (Redundancy)•Combination of the above

Normalised relations solve data maintenance problems and minimise redundancy, but implemented as such as physical records, may not yield efficient data processing.

NB: Only use De-normalisation to gain explicit processing speed when other design actions are not sufficient!

Page 3: Normalization 6. 1 CSE2132 Database Systems Week 6 Lecture Normalization

Normalization 6. 3

Goal of Relational Design

What Relations (tables) should exist and what Attributes (columns) should they contain?

• Avoid Redundancy if possible

- minimize storage space

• Avoid Anomalies (data that does not make business sense)

• Avoid Nulls

• Avoid Joins which produce spurious (false) tuples (rows)

Page 4: Normalization 6. 1 CSE2132 Database Systems Week 6 Lecture Normalization

Normalization 6. 4

Dependency Theory" One truly scientific part of the field [of database design]"

Date 5th ed p.325

Relational database design - a mechanical approach to

producing a database schema with certain desirable

properties.

Following….

A review of normal forms and the problems they solve.

Page 5: Normalization 6. 1 CSE2132 Database Systems Week 6 Lecture Normalization

Normalization 6. 5

Data NormalizationNormalization is a formal process to decide which attributes should be

grouped together. Primarily a tool/technique to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data.

It provides a formal measure of why one grouping of attributes may be better than another.

Each Normal Form requires that a relation satisfies criteria for that normal form and this eliminates a different kind of redundancy.

Database operations applied to unnormalized relations may lead to anomalies. Normalized Relations will remain consistent following database operations and will store each fact only once.

Page 6: Normalization 6. 1 CSE2132 Database Systems Week 6 Lecture Normalization

Normalization 6. 6

AssumptionsA group of attributes has a natural “inherent” structure.

This structure is independent of the way the data is used.

Normalization

Introduced by E. Codd together with relational database

theory.

Originally Codd defined three normal forms.

This was later expanded to include Boyce-Codd and fourth and fifth normal forms.

Page 7: Normalization 6. 1 CSE2132 Database Systems Week 6 Lecture Normalization

Normalization 6. 7

Anomalies

Person_Id Project_budget Project_Id Time_ Spent_on_Project

S75 32 P1 7

S75 40 P2 8

S79 32 P1 4

S79 27 P3 1

S80 40 P2 5

- 17 P4 -

Null Values are considered to be anomalies

Consider the poorly structured relation ASSIGN

Page 8: Normalization 6. 1 CSE2132 Database Systems Week 6 Lecture Normalization

Normalization 6. 8

Anomalies

Insertion Anomaly

add tuple (ASSIGN , <S85,35,P1,9>)

- two conflicting budgets for P1

Deletion Anomaly

delete tuple (ASSIGN, <S79,27,P3,1>)

- removes project budget for P3

Page 9: Normalization 6. 1 CSE2132 Database Systems Week 6 Lecture Normalization

Normalization 6. 9

AnomaliesUpdate anomalies

update tuple (ASSIGN, <S75,32,P1,7>,<S75,35,P1,7>)

This example tries to update the budget for P1. But P1 is also listed in the row with S79 ...

• either multiple updates or the potential for inconsistency ...

Page 10: Normalization 6. 1 CSE2132 Database Systems Week 6 Lecture Normalization

Normalization 6. 10

Normalization and Functional Dependencies

• Normalization is based on the analysis of Functional Dependencies.

• Functional dependency = constraint between two attributes or two sets of attributes.

Page 11: Normalization 6. 1 CSE2132 Database Systems Week 6 Lecture Normalization

Normalization 6. 11

X Y

Functional Dependencies- the values of one set of attributes effect the values of another attribute.

The value of X determines the value of Y.

The value of Y is functionally dependent on the value of X.Y is a fact about X.

The simplest case is 1 attribute determines another single attribute.Often 2 or 3 attributes are needed to determine another single attribute.

Page 12: Normalization 6. 1 CSE2132 Database Systems Week 6 Lecture Normalization

Normalization 6. 12

Project_id Project Budget

Person_IdProject_id

Time Spent on Project

Alternative Representation: Functional Dependency Diagram

Project_id Project Budget

Functional Dependencies

Referring to slide 6.7 ...

Page 13: Normalization 6. 1 CSE2132 Database Systems Week 6 Lecture Normalization

Normalization 6. 13

Task: Write down all the Functional Dependencies

EMPLOYEE1

Emp_id

EMPLOYEE2

Emp_idCourse_id

Answer:

Answer:

Namebirtdatesalary

Namesalarydate_completed

Page 14: Normalization 6. 1 CSE2132 Database Systems Week 6 Lecture Normalization

Normalization 6. 14

First Normal Form (1NF)

A table is in 1NF if:

• it contains no repeating groups (i.e. no multi-valued attributes)

• every attribute is atomic

( Relational Model does not handle repeating groups)

Relationship between key and non-key fields

Will be one to one(1:1) or one to many (1:N)

Page 15: Normalization 6. 1 CSE2132 Database Systems Week 6 Lecture Normalization

Normalization 6. 15

First Normal Form (Example)

•Remove Repeating Groups

•All occurrences in a relation must have the same numberof fields

Relation:STUDENT(STUD#,SNAME(SUBCODE,TITLE,RESULT))

1NF Relation: STUDENT(STUD#,SNAME)

STUDENT-RESULT(STUD#,SUBCODE,TITLE,RESULT)

Page 16: Normalization 6. 1 CSE2132 Database Systems Week 6 Lecture Normalization

Normalization 6. 16

Second Normal Form

A relation is in 2NF if:

• it is in 1NF, and

•every non-key attribute is fully functionally dependent on the whole key.

Problems with relations not in 2NF:- repeated information- update anomalies- potential inconsistency - delete anomalies

Page 17: Normalization 6. 1 CSE2132 Database Systems Week 6 Lecture Normalization

Normalization 6. 17

Second Normal Form (Example)

•Remove Partial Dependencies

•A non-key attribute cannot be identified by part of a composite key

ORDER-ITEM(ORDER#,ITEM#, DESC, QTY)

ORDER-ITEM(ORDER#,ITEM#,QTY) ITEM(ITEM#,DESC)

Page 18: Normalization 6. 1 CSE2132 Database Systems Week 6 Lecture Normalization

Normalization 6. 18

ORDER-ITEM

ORDER# ITEM# DESC QTY

27 873 NUT 2 28 402 BOLT 1 28 873 NUT 10 30 495 WASHER 50

UPDATE - change DESC in many placesDELETE - data for ITEM is lost when ORDER is deleted

INSERT - cannot create a new ITEM until an ORDER requires that ITEM

Anomalies due to Partial Dependencies

Page 19: Normalization 6. 1 CSE2132 Database Systems Week 6 Lecture Normalization

Normalization 6. 19

ORDER-ITEM

ORDER# ITEM# QTY Delete Order# 30 and washer still remains 27 873 2 28 402 1 28 873 10 30 495 50 ITEM Add a new Item at any time ITEM# DESC 873 NUT Update BOLT in one place only 402 BOLT 495 WASHER

Solution to 2NF Anomalies

Page 20: Normalization 6. 1 CSE2132 Database Systems Week 6 Lecture Normalization

Normalization 6. 20

A relation is in 3NF if:

•it is in 2NF, and

•contains no transitive dependencies

3NF- is violated when a non-key field is a fact(thus a functional dependency exists) about another non-key field

Problems with relations not in 3NF:-as for 2NF

Third Normal FormA functional dependencybetween two (or more)

nonkey attributes, gives riseto a transitive dependency

Page 21: Normalization 6. 1 CSE2132 Database Systems Week 6 Lecture Normalization

Normalization 6. 21

•Remove Transitive Dependencies

•A non-key attribute cannot be identified by another non-key attribute.

EMPLOYEE(EMP#,ENAME,DEPT#,DNAME)

EMPLOYEE(EMP#,ENAME,DEPT#) DEPARTMENT(DEPT#,DNAME)

Third Normal Form (Example)The functional dependency between the nonkey attributes (DEPT# and DNAME_, gives rise to a transitive dependency (EMP# DNAME). Remove this transitive dependency

Emp# dept#dept# dname

thereforeemp# dname(transitively)

Page 22: Normalization 6. 1 CSE2132 Database Systems Week 6 Lecture Normalization

Normalization 6. 22

EMPLOYEE

EMP# ENAME DEPT# DNAME

10 SMITH D5 EDP 20 JONES D7 FINANCE 25 SMITH D7 FINANCE 30 BLACK D8 SALES

UPDATE - change DNAME in many placesDELETE - data for DEPT is lost when last EMP is deletedfor DEPTINSERT - cannot create a new DEPT until an EMP startsfor that DEPT

Anomalies due to Transitive Dependencies

Page 23: Normalization 6. 1 CSE2132 Database Systems Week 6 Lecture Normalization

Normalization 6. 23

EMPLOYEE DELETE last EMP but DEPT still remainsEMP# ENAME DEPT#

10 SMITH D5 20 JONES D7 25 SMITH D7 30 BLACK D8 DEPARTMENT

DEPT# DNAME ADD new DEPT at any time D5 EDP D7 FINANCE UPDATE DNAME once D8 SALES

Solution to 3NF Anomalies

Page 24: Normalization 6. 1 CSE2132 Database Systems Week 6 Lecture Normalization

Normalization 6. 24

A Simple Test for 3NF

Each attribute should depend on :

the key

the whole key

and nothing but the key

(so help me CODD)

Page 25: Normalization 6. 1 CSE2132 Database Systems Week 6 Lecture Normalization

Normalization 6. 25

Steps in Normalization

Page 26: Normalization 6. 1 CSE2132 Database Systems Week 6 Lecture Normalization

Normalization 6. 26

Example Problem• Consider the poorly formed relation following. The HR department

wishes to keep track of Employees, Departments, Jobs and Employee job assignments. The primary key of the relation is underlined.

ASSIGNMENT(EMP-ID, JOB-CODE,DEPT-NO,EMP_NAME, JOB-DESCR, DATE_JOB_ASSIGNED,DEPT-DESC)

• It is known that EMP_ID functionally determines EMP-NAME and DEPT-NO, DEPT-NO functionally determines DEPT-DESC and that JOB_CODE functionally determines JOB_DESCR. The system also needs to keep track of the date on which a specific employee has been assigned to a specific job. An employee can be assigned to more than one job over time.

Page 27: Normalization 6. 1 CSE2132 Database Systems Week 6 Lecture Normalization

Normalization 6. 27

The Question

[1] In what normal form (if any) is the relation as it appears above?

[2] Rewrite the above relation as a number of relations all of which are in third normal form. (It is not required to write down relations in 1st or 2nd normal form.)

Page 28: Normalization 6. 1 CSE2132 Database Systems Week 6 Lecture Normalization

Normalization 6. 28

One Approach to Solving• Draw a data structure diagram (DSD) that is a best guess as to the

final relations

• Identify the primary key in each relation

• Make sure each attribute is functionally dependent on the primary key attribute(s)

• Check a foreign key is present (at the many end) if the relation is related to some other relation

• Scan the resulting DSD for any omitted relationships, any repeating groups, partial dependencies or transitive dependencies

• If relationships are present include those relationships.

• If repeating groups, partial dependencies or transitive dependencies are present break down the offending relation further

Page 29: Normalization 6. 1 CSE2132 Database Systems Week 6 Lecture Normalization

Normalization 6. 29

An Answer

• It is in first normal form as there are no repeating groups.

• EMPLOYEE(EMP-ID,EMP_NAME,DEPT-NO)

• JOB(JOB-CODE,JOB-DESCR)

• ASSIGNMENT(EMP-ID, JOB-CODE, DATE_JOB_ASSIGNED)

• DEPARTMENT(DEPT-NO,DEPT-DESC)

EMPLOYEE JOB

ASSIGNMENTDEPT