41
Normalization A logical design method which minimizes data redundancy and reduces design flaws. Consists of applying various “normal” forms to the database design. The normal forms break down large tables into smaller subsets.

Normalisation revision

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Normalisation revision

Normalization

A logical design method which minimizes data redundancy and reduces design flaws.

•Consists of applying various “normal” forms to the database design.

•The normal forms break down large tables into smaller subsets.

Page 2: Normalisation revision

First Normal Form (1NF)

Each attribute must be atomic• No repeating columns within a row.• No multi-valued columns.

1NF simplifies attributes• Queries become easier.

Page 3: Normalisation revision

1NF

Employee (unnormalized)

emp_no name dept_no dept_name skills1 Kevin Jacobs 201 R&D C, Perl, Java2 Barbara Jones 224 IT Linux, Mac3 Jake Rivera 201 R&D DB2, Oracle, Java

emp_no name dept_no dept_name skills1 Kevin Jacobs 201 R&D C1 Kevin Jacobs 201 R&D Perl1 Kevin Jacobs 201 R&D Java2 Barbara Jones 224 IT Linux2 Barbara Jones 224 IT Mac3 Jake Rivera 201 R&D DB23 Jake Rivera 201 R&D Oracle3 Jake Rivera 201 R&D Java

Employee (1NF)

Page 4: Normalisation revision

Second Normal Form (2NF)

Each attribute must be functionally dependent on the primary key.

• Functional dependence - the property of one or more attributes that uniquely determines the value of other attributes.• Any non-dependent attributes are moved into a smaller (subset) table.

2NF improves data integrity.• Prevents update, insert, and delete anomalies.

Page 5: Normalisation revision

Functional Dependence

Name, dept_no, and dept_name are functionally dependent on emp_no. (emp_no -> name, dept_no, dept_name)

Skills is not functionally dependent on emp_no since it is not unique to each emp_no.

emp_no name dept_no dept_name skills1 Kevin Jacobs 201 R&D C1 Kevin Jacobs 201 R&D Perl1 Kevin Jacobs 201 R&D Java2 Barbara Jones 224 IT Linux2 Barbara Jones 224 IT Mac3 Jake Rivera 201 R&D DB23 Jake Rivera 201 R&D Oracle3 Jake Rivera 201 R&D Java

Employee (1NF)

Page 6: Normalisation revision

2NF

emp_no name dept_no dept_name skills1 Kevin Jacobs 201 R&D C1 Kevin Jacobs 201 R&D Perl1 Kevin Jacobs 201 R&D Java2 Barbara Jones 224 IT Linux2 Barbara Jones 224 IT Mac3 Jake Rivera 201 R&D DB23 Jake Rivera 201 R&D Oracle3 Jake Rivera 201 R&D Java

Employee (1NF)

emp_no name dept_no dept_name1 Kevin Jacobs 201 R&D2 Barbara Jones 224 IT3 Jake Rivera 201 R&D

Employee (2NF)emp_no skills1 C1 Perl1 Java2 Linux2 Mac3 DB23 Oracle3 Java

Skills (2NF)

Page 7: Normalisation revision

Data Integrity

• Insert Anomaly - adding null values. eg, inserting a new department does not require the primary key of emp_no to be added. • Update Anomaly - multiple updates for a single name change, causes performance degradation. eg, changing IT dept_name to IS• Delete Anomaly - deleting wanted information. eg, deleting the IT department removes employee Barbara Jones from the database

emp_no name dept_no dept_name skills1 Kevin Jacobs 201 R&D C1 Kevin Jacobs 201 R&D Perl1 Kevin Jacobs 201 R&D Java2 Barbara Jones 224 IT Linux2 Barbara Jones 224 IT Mac3 Jake Rivera 201 R&D DB23 Jake Rivera 201 R&D Oracle3 Jake Rivera 201 R&D Java

Employee (1NF)

Page 8: Normalisation revision

Third Normal Form (3NF)

Remove transitive dependencies.• Transitive dependence - two separate entities exist within one table.• Any transitive dependencies are moved into a smaller (subset) table.

3NF further improves data integrity.• Prevents update, insert, and delete anomalies.

Page 9: Normalisation revision

Transitive Dependence

Dept_no and dept_name are functionally dependent on emp_no however, department can be considered a separate entity.

emp_no name dept_no dept_name1 Kevin Jacobs 201 R&D2 Barbara Jones 224 IT3 Jake Rivera 201 R&D

Employee (2NF)

Page 10: Normalisation revision

3NF

emp_no name dept_no dept_name1 Kevin Jacobs 201 R&D2 Barbara Jones 224 IT3 Jake Rivera 201 R&D

Employee (2NF)

emp_no name dept_no1 Kevin Jacobs 2012 Barbara Jones 2243 Jake Rivera 201

Employee (3NF)

dept_no dept_name201 R&D224 IT

Department (3NF)

Page 11: Normalisation revision

Other Normal Forms

Boyce-Codd Normal Form (BCNF)• Strengthens 3NF by requiring the keys in the functional dependencies to be superkeys (a column or columns that uniquely identify a row)

Fourth Normal Form (4NF)• Eliminate trivial multivalued dependencies.

Fifth Normal Form (5NF)• Eliminate dependencies not determined by keys.

Page 12: Normalisation revision

Normalizing our webstore (1NF)

customers

itemsitem_id name price inventory34 sweater red 50 2135 sweater blue 50 1056 t-shirt 25 7672 jeans 75 581 jacket 175 9

cust_id name address credit_card_num credit_card_type45 Mike Speedy 123 A St. 45154 visa45 Mike Speedy 123 A St. 32499 mastercard45 Mike Speedy 123 A St. 12834 discover78 Frank Newmon 2 Main St. 45698 visa102 Joe Powers 343 Blue Blvd. 94065 mastercard102 Joe Powers 343 Blue Blvd. 10532 discover

ordersorder_id cust_id item_id quantity cost date status405 45 34 2 100 2/306 shipped405 45 35 1 50 2/306 shipped405 45 56 3 75 2/306 shipped408 78 56 2 50 3/5/06 refunded410 102 72 2 150 3/10/06 shipped410 102 81 1 175 3/10/06 shipped

Page 13: Normalisation revision

Normalizing our webstore (2NF & 3NF)

customers credit_cardscust_id name address45 Mike Speedy 123 A St.78 Frank Newmon 2 Main St.102 Joe Powers 343 Blue Blvd.

cust_id num type45 45154 visa45 32499 mastercar

d45 12834 discover78 45698 visa102 94065 mastercar

d102 10532 discover

Page 14: Normalisation revision

Normalizing our webstore (2NF & 3NF)

order detailsorder_id item_id quantity cost405 34 2 100405 35 1 50405 56 3 75408 56 2 50410 72 2 150410 81 1 175

itemsitem_id name price inventory34 sweater red 50 2135 sweater blue 50 1056 t-shirt 25 7672 jeans 75 581 jacket 175 9

order_id cust_id date status405 45 2/306 shipped408 78 3/5/06 refunded410 102 3/10/06 shipped

orders

Page 15: Normalisation revision

Normalisation 2

• Overview• normalise a relation to Boyce Codd Normal Form (BCNF)• An example

Page 16: Normalisation revision

Boyce-Codd Normal Form (BCNF)

• When a relation has more than one candidate key, anomalies may result even though the relation is in 3NF.

• 3NF does not deal satisfactorily with the case of a relation with overlapping candidate keys• i.e. composite candidate keys with at least one attribute in common.

• BCNF is based on the concept of a determinant.• A determinant is any attribute (simple or composite) on which some

other attribute is fully functionally dependent.

• A relation is in BCNF is, and only if, every determinant is a candidate key.

Page 17: Normalisation revision

The theory

• Consider the following relation and determinants.R(a,b,c,d)

a,c -> b,da,d -> b

• To be in BCNF, all valid determinants must be a candidate key. In the relation R, a,c->b,d is the determinate used, so the first determinate is fine.

• a,d->b suggests that a,d can be the primary key, which would determine b. However this would not determine c. This is not a candidate key, and thus R is not in BCNF.

Page 18: Normalisation revision

Example 1

Patient No Patient Name Appointment Id Time Doctor

1 John 0 09:00 Zorro

2 Kerr 0 09:00 Killer

3 Adam 1 10:00 Zorro

4 Robert 0 13:00 Killer

5 Zane 1 14:00 Zorro

Page 19: Normalisation revision

Two possible keys

• DB(Patno,PatName,appNo,time,doctor)• Determinants:

• Patno -> PatName• Patno,appNo -> Time,doctor• Time -> appNo

• Two options for 1NF primary key selection:• DB(Patno,PatName,appNo,time,doctor) (example 1a)• DB(Patno,PatName,appNo,time,doctor) (example 1b)

Page 20: Normalisation revision

Example 1a

• DB(Patno,PatName,appNo,time,doctor) • No repeating groups, so in 1NF• 2NF – eliminate partial key dependencies:

• DB(Patno,appNo,time,doctor)• R1(Patno,PatName)

• 3NF – no transient dependences so in 3NF• Now try BCNF.

Page 21: Normalisation revision

BCNF Every determinant is a candidate key

DB(Patno,appNo,time,doctor)R1(Patno,PatName)

• Is determinant a candidate key?• Patno -> PatName

Patno is present in DB, but not PatName, so irrelevant.

Page 22: Normalisation revision

Continued…

DB(Patno,appNo,time,doctor)R1(Patno,PatName)

• Patno,appNo -> Time,doctorAll LHS and RHS present so relevant. Is this a candidate key? Patno,appNo IS the key, so this is a candidate key.

• Time -> appNoTime is present, and so is appNo, so relevant. Is this a candidate key? If it was then we could rewrite DB as: DB(Patno,appNo,time,doctor)This will not work, so not BCNF.

Page 23: Normalisation revision

Rewrite to BCNF

• DB(Patno,appNo,time,doctor)R1(Patno,PatName)

• BCNF: rewrite to DB(Patno,time,doctor) R1(Patno,PatName) R2(time,appNo)

• time is enough to work out the appointment number of a patient. Now BCNF is satisfied, and the final relations shown are in BCNF

Page 24: Normalisation revision

Example 1b

• DB(Patno,PatName,appNo,time,doctor) • No repeating groups, so in 1NF• 2NF – eliminate partial key dependencies:

• DB(Patno,time,doctor)• R1(Patno,PatName) • R2(time,appNo)

• 3NF – no transient dependences so in 3NF• Now try BCNF.

Page 25: Normalisation revision

BCNF Every determinant is a candidate key

DB(Patno,time,doctor)R1(Patno,PatName) R2(time,appNo)

• Is determinant a candidate key?• Patno -> PatName

Patno is present in DB, but not PatName, irrelevant.• Patno,appNo -> Time,doctor

Not all LHS present so not relevant• Time -> appNo

Time is present, but not appNo, so not relevant. • Relations are in BCNF.

Page 26: Normalisation revision

Summary - Example 1

This example has demonstrated three things:

• BCNF is stronger than 3NF, relations that are in 3NF are not necessarily inBCNF

• BCNF is needed in certain situations to obtain full understanding of the data model

• there are several routes to take to arrive at the same set of relations in BCNF.• Unfortunately there are no rules as to which route will be the easiest

one to take.

Page 27: Normalisation revision

Example 2

Grade_report(StudNo,StudName,(Major,Adviser,(CourseNo,Ctitle,InstrucName,InstructLocn,Grade)))

• Functional dependencies• StudNo -> StudName• CourseNo -> Ctitle,InstrucName• InstrucName -> InstrucLocn• StudNo,CourseNo,Major -> Grade• StudNo,Major -> Advisor• Advisor -> Major

Page 28: Normalisation revision

Example 2 cont...

• UnnormalisedGrade_report(StudNo,StudName,(Major,Advisor, (CourseNo,Ctitle,InstrucName,InstructLocn,Grade)))

• 1NF Remove repeating groups• Student(StudNo,StudName)• StudMajor(StudNo,Major,Advisor)• StudCourse(StudNo,Major,CourseNo,

Ctitle,InstrucName,InstructLocn,Grade)

Page 29: Normalisation revision

Example 2 cont...

• 1NFStudent(StudNo,StudName)StudMajor(StudNo,Major,Advisor)StudCourse(StudNo,Major,CourseNo, Ctitle,InstrucName,InstructLocn,Grade)

• 2NF Remove partial key dependenciesStudent(StudNo,StudName)StudMajor(StudNo,Major,Advisor)StudCourse(StudNo,Major,CourseNo,Grade)Course(CourseNo,Ctitle,InstrucName,InstructLocn)

Page 30: Normalisation revision

Example 2 cont...

• 2NFStudent(StudNo,StudName)StudMajor(StudNo,Major,Advisor)StudCourse(StudNo,Major,CourseNo,Grade)Course(CourseNo,Ctitle,InstrucName,InstructLocn)

• 3NF Remove transitive dependenciesStudent(StudNo,StudName)StudMajor(StudNo,Major,Advisor)StudCourse(StudNo,Major,CourseNo,Grade)Course(CourseNo,Ctitle,InstrucName)Instructor(InstructName,InstructLocn)

Page 31: Normalisation revision

Example 2 cont...

• BCNF Every determinant is a candidate key• Student : only determinant is StudNo• StudCourse: only determinant is StudNo,Major• Course: only determinant is CourseNo• Instructor: only determinant is InstrucName• StudMajor: the determinants are

• StudNo,Major, or• AdvisorOnly StudNo,Major is a candidate key.

Page 32: Normalisation revision

Example 2: BCNF

• BCNF

Student(StudNo,StudName)StudCourse(StudNo,Major,CourseNo,Grade)Course(CourseNo,Ctitle,InstrucName)Instructor(InstructName,InstructLocn)StudMajor(StudNo,Advisor)Adviser(Adviser,Major)

Page 33: Normalisation revision

Problems BCNF overcomes

• If the record for student 456 is deleted we lose not only information on student 456 but also the fact that DARWIN advises in BIOLOGY

• we cannot record the fact that WATSON can advise on COMPUTING until we have a student majoring in COMPUTING to whom we can assign WATSON as an advisor.

STUDENT MAJOR ADVISOR

123 PHYSICS EINSTEIN

123 MUSIC MOZART

456 BIOLOGY DARWIN

789 PHYSICS BOHR

999 PHYSICS EINSTEIN

Page 34: Normalisation revision

Split into two tables

In BCNF we have two tables

STUDENT ADVISOR

123 EINSTEIN

123 MOZART

456 DARWIN

789 BOHR

999 EINSTEIN

ADVISOR MAJOR

EINSTEIN PHYSICS

MOZART MUSIC

DARWIN BIOLOGY

BOHR PHYSICS

Page 35: Normalisation revision

Returning to the ER Model

• Now that we have reached the end of the normalisation process, you must go back and compare the resulting relations with the original ER model

• You may need to alter it to take account of the changes that have occurred during the normalisation process Your ER diagram should always be a prefect reflection of the model you are going to implement in the database, so keep it up to date!

• The changes required depends on how good the ER model was at first!

Page 36: Normalisation revision

Video Library Example

• A video library allows customers to borrow videos.• Assume that there is only 1 of each video.• We are told that:

video(title,director,serial)customer(name,addr,memberno)hire(memberno,serial,date)

title->director,serialserial->titleserial->directorname,addr -> membernomemberno -> name,addrserial,date -> memberno

Page 37: Normalisation revision

What NF is this?

• No repeating groups therefore at least 1NF• 2NF – A Composite key exists:

hire(memberno,serial,date)• Can memberno be found with just serial or date?• NO, therefore the relations are already in 2NF.

• 3NF?

Page 38: Normalisation revision

Test for 3NF

• video(title,director,serial)• title->director,serial• serial->director

• Director can be derived using serial, and serial and director are both non keys, so therefore this is a transitive or non-key dependency.

• Rewrite video…

Page 39: Normalisation revision

Rewrite for 3NF

• video(title,director,serial)• title->director,serial• serial->director

• Becomes:• video(title,serial)• serial(serial,director)

Page 40: Normalisation revision

Check BCNF

• Is every determinant a candidate key?• video(title,serial) - Determinants are:

• title->director,serial Candidate key• serial->title Candidate key • video in BCNF

• serial(serial,director) Determinants are: • serial->director Candidate key• serial in BCNF

Page 41: Normalisation revision

• customer(name,addr,memberno) Determinants are: • name,addr -> memberno Candidate key• memberno -> name,addr Candidate key • customer in BCNF

• hire(memberno,serial,date) Determinants are:• serial,date -> memberno Candidate key• hire in BCNF

• Therefore the relations are also now in BCNF.