48
BIS 1204: Data and Information Management I Evening ‘Group 2’ Presentation: Slide 1/48 MUKALELE Rogers MULINDA Saddati TUSABA Pauline Joan MUSANA Evans Kwesiga Allan 13/U/ 21067/EVE 13/U/ 21076/EVE 13/U/21363/ EVE 13/U/ 21078/EVE 13/U/ 20996/EVE 213024992 213008565 213010254 Makerere University 23/06/22 - 07:00 AM GROUP ASSIGNMENT 3: BIS 1204: Data and Information Management I by Mr. ATUGONZA Martin NORMALIZATIO N

NORMALIZATION - BIS 1204: Data and Information Management I

Embed Size (px)

Citation preview

Page 1: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 1/48

MUKALELE Rogers

MULINDA Saddati

TUSABA Pauline Joan

MUSANA Evans

Kwesiga Allan

13/U/21067/EVE

13/U/21076/EVE

13/U/21363/EVE

13/U/21078/EVE

13/U/20996/EVE

213024992 213008565

213003883 213004582

213010254

Makerere University

Monday 17 April 2023 - 11:42 PM

GROUP ASSIGNMENT 3:

BIS 1204: Data and Information Management I by Mr. ATUGONZA Martin

NORMALIZATION

Page 2: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 2/48

Presentation Objectives• The purpose of normalization.• How normalization can be used during database design. • The update anomalies associated with data redundancy.• The concept of functional dependencies, which describe the

relationship between attributes.• How to undertake the process of normalization.• How to identify the most commonly used normal forms: First Normal

Form(1NF), Second Normal Form (2NF), and Third Normal Form (3NF).

• Introduction to Advanced Normalisation: The Boyce–Codd Normal Form (BCNF) and higher normal forms.

Page 3: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 3/48

The purpose of normalization.• When tasked to design a relational database for an enterprise,

we are given the user and data requirements, and our objective is to generate a set of relations (tables) that allow us to store information without unnecessary redundancy, yet also allows us to retrieve information easily.

• The purpose of normalization is to identify a suitable set of relations:– that support the data requirements of an enterprise. – containing a minimal number of only NECESSARY attributes.– where attributes with a close logical relationship are found in the

same relation;– having minimal data redundancy ( with the important exception of

foreign keys essential for the joining of related relations.)

Page 4: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 4/48

Normalization is a technique for producing a set of relations with desirable properties, given the data requirements of an enterprise.

Please Note:

Page 5: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 5/48

How normalization can be used during database design.

• Approach 1: Normalization can be used as a bottom up standalone database design technique to create a set of relations.

• Approach 2:Normalization can be used as a validation technique to check the structure of relations, which may have been created using a top-down approach such as ER modeling.

• No matter which approach is used, the goal is the same: that of creating a set of well-designed relations that meet the data requirements of the enterprise.

Page 6: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 6/48

How normalization can be used during database design (cont)

Page 7: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 7/48

Data Redundancy• Data redundancy is a direct result of storing copies of the

same data in more than one place in a database (duplication of data), which can lead to inconsistencies if some of the copies are modified and thus loss of data integrity.

• Benefits of reducing data redundancy include:– updates to the data stored in the database are achieved with a

minimal number of operations– Reduction in the opportunities for data inconsistencies;– Reduction in the file storage space required by the base relations

thus minimizing costs.

Page 8: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 8/48

Data Redundancy: Case Study

StudentDetail

We illustrate the problems associated with unwanted data redundancy by comparing the StudentDetail relation shown below, with the Student, District and Hostel relations on the next slide.

Page 9: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 9/48

Data Redundancy: Case Study (cont)Student

District Hostel

Page 10: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 10/48

Data Redundancy• StudentDetail relation has redundant data: details of a

hotel are repeated for every student.

• In contrast, hostel information appears only once for each hostel in Hostel relation and only HostelID is repeated in Staff relation, to represent where each student resides.

• The same case applies for districts.

Page 11: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 11/48

Update Anomalies• Relations that contain redundant information may

potentially suffer from update anomalies. Types of update anomalies include:– Modification– Insertion– Deletion

(a) Modification anomalies• If we want to change the value of one of the attributes of a

particular hostel in the StudentDetail relation, for example the address for hostel number HOST07, we must update the tuples of all students residing at that hostel.

Page 12: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 12/48

Update Anomalies (cont)(b) Insertion Anomalies:– Every time we insert the details of new students into the

StudentDetail relation, we must include the details of the hostel at which the students are to reside.

– To insert details of a new hostel that currently has no students into the StudentDetail relation, it is necessary to enter nulls into the attributes such as staffNo, which violates entity integrity.

(c) Deletion Anomalies– If we delete a tuple from the StudentDetail relation that

represents the last student residing at a hostel, the details about that hostel are also lost from the database.

Page 13: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 13/48

Properties associated with decomposition of relations

• There are two important properties associated with decomposition of a larger relation into smaller relations:

- Lossless-join property enables us to find any instance of original relation from corresponding instances in the smaller relations.

- Dependency preservation property enables us to enforce a constraint on original relation by enforcing some constraint on each of the smaller relations.

Page 14: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 14/48

The concept of Functional Dependencies

• Functional Dependency– Describes relationship between attributes in a relation. – If A and B are attributes of relation R, B is functionally

dependent on A (denoted A B), if each value of A is associated with exactly one value of B.

• Determinant of a functional dependency refers to attribute or group of attributes on left-hand side of the arrow.

Page 15: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 15/48

Functional dependency is One of the main concepts associated with normalization, which describes the relationship between attributes in a relation. For example, if A and B are attributes of relation R, B is functionally dependent on A (denoted A → B), if each value of A is associated with exactly one value of B. (A and B may each consist of one or more attributes.)

Please Note:

Page 16: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 16/48

Types of Functional Dependencies• Full functional dependency: If A and B are attributes of a

relation, B is fully functionally dependent on A if B is functionally dependent on A, but not on any proper subset of A.

• Partial Functional Dependency: A functional dependency A→B is a partial dependency if there is some attribute that can be removed from A and yet the dependency still holds.

• Transitive Dependency: A condition where A, B, and C are attributes of a relation such that if A → B and B → C, then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C).

Page 17: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 17/48

Characteristics of functional dependencies used in normalisation

1. There is a one-to-one relationship between the attribute(s) on the left-hand side (determinant) and those on the right-hand side of a functional dependency. (Note that the relationship in the opposite direction, that is from the right- to the left-hand side attributes, can be a one-to-one relationship or one-to-many relationship.)

2. They hold for all time. Eg. The Dependancy SName RegNo does not hold for all time but RegNo Sname holds for all time.

3. The determinant has the minimal number of attributes necessary to maintain the dependency with the attribute(s) on the right-hand side. In other words, there must be a full functional dependency between the attribute(s) on the left- and right-hand sides of the dependency.

Page 18: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 18/48

The process of normalization• Formal process for analyzing a relation based on its primary

key and functional dependencies between its attributes.• Often executed as a series of steps. Each step corresponds

to a specific normal form, which has known properties.• As normalization proceeds, relations become progressively

more restricted (stronger) in format and also less vulnerable to update anomalies.

• Four most commonly used normal forms are first (1NF), second (2NF) and third (3NF) normal forms, and Boyce–Codd normal form (BCNF). Other forms are 4NF, 5NF and other Higher Normal forms.

Page 19: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 19/48

The process

of normalization

cont’d

Page 20: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 20/48

Unnormalized Form (UNF)

• A table that contains one or more repeating groups.– Note: A repeating group is an attribute or group of

attributes within a table that occurs with multiple values for a single occurrence of the nominated key attributes for that table. For example a book with multiple authors, A client with many properties, etc.

• To create an unnormalized table: – transform data from information source (e.g. form)

into table format with columns and rows.

Page 21: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 21/48

Unnormalized Form (UNF) cont’d

Stage 1: Paper Forms (Data Source)

Stage 2: Table in UNF

Page 22: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 22/48

First Normal Form (1NF)• First Normal Form (1NF) is a relation in which

the intersection of each row and column contains one and only one value.

• A table is in First Normal Form (1NF) if all its attributes are atomic, i.e. are considered to be indivisible units.

• It should have no composite attributes, no multivalued attributes and no repeating groups.

Page 23: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 23/48

First Normal Form (1NF) is a relation in which the intersection of each row and column contains one and only one value.

Please Note:

Page 24: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 24/48

UNF to 1NF (two approaches)

In case a table is not in 1NF, we do two things:• First Nominate an attribute or group of attributes

to act as the primary key for the unnormalized table, then use any of the approaches below:

Approach 1: • Identify repeating group(s) in unnormalized table

which repeats for the key attribute(s).• Flatten the table: Place each value of a repeating

group on a tuple with duplicate values of the non-repeating data.

Page 25: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 25/48

UNF to 1NF (Approach 1) cont’d

Stage 2: Table in UNF

Stage 3: Table in 1NF

Page 26: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 26/48

UNF to 1NF (two approaches)

Approach 2: • Make a new table to cater for multivalued

attributes. • Place the repeating group along with copy of the

original primary key attribute(s) into a separate relation

• The new primary key is usually a combination of the (multivalued) attribute and the primary key of the parent table.

Page 27: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 27/48

UNF to 1NF (two approaches)

Approach 2:

Stage 2: Table in UNF

Stage 3: Two Tables in 1NF

Page 28: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 28/48

Second Normal Form (2NF)• Second Normal Form (2NF) is based on the concept

of full functional dependency. Second Normal Form applies to relations with composite keys, that is, relations with a primary key composed of two or more attributes.

• The normalization of 1NF relations to 2NF involves the removal of partial dependencies.

• If a partial dependency exists, we remove the partially dependent attribute(s) from the relation by placing them in a new relation along with a copy of their determinant.

Page 29: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 29/48

Second Normal Form (2NF) is a relation that is in First Normal Form and every non-primary-key attribute is fully functionally dependent on the primary key. Full functional dependency indicates that if A and B are attributes of a relation, B is fully functionally dependent on A if B is functionally dependent on A but not on any proper subset of A.

Please Note:

Page 30: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 30/48

Second Normal Form (2NF) cont’d• To change the 1NF ClientRental relation to 2NF, we

begin by identifying the presence of any partial dependencies on the primary key.

Page 31: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 31/48

Second Normal Form (2NF) cont’dThe ClientRental relation has the following functional dependancies:• fd1 clientNo, propertyNo → rentStart, rentFinish (Primary key)• fd2 clientNo → cName (Partial dependency)

NB: cName is partially dependent on the primary key, because it depends on only the clientNo attribute and not the whole primary key

(clientNo & propertyNo combination).• fd3 propertyNo → pAddress, rent, ownerNo, oName (Partial

dependency)• fd4 ownerNo → oName (Transitive dependency)• fd5 clientNo, rentStart → propertyNo, pAddress,• rentFinish, rent, ownerNo, oName (Candidate key)• fd6 propertyNo, rentStart → clientNo, cName, rentFinish (Candidate

key)

Page 32: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 32/48

Second Normal Form (2NF) cont’d• The identification of partial dependencies within the ClientRental

relation indicates that the relation is not in 2NF. • To transform the ClientRental relation into 2NF requires the creation

of new relations so that the non-primary-key attributes are removed along with a copy of the part of the primary key on which they are fully functionally dependent.

• This results in the creation of three new relations called Client, Rental, and PropertyOwner having the following form:

• Client (clientNo, cName)• Rental (clientNo, propertyNo, rentStart, rentFinish)• PropertyOwner (propertyNo, pAddress, rent, ownerNo, oName)• These three relations are now in Second Normal Form as every

nonprimary- key attribute is fully functionally dependent on the primary key of the relation. See the next slide for illustration.

Page 33: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 33/48

1NF to 2NF illustration

Stage 3: Two Tables in 1NF

Stage 4: Three Tables in 2NF

Page 34: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 34/48

Third Normal Form (3NF)• 2NF relations may still suffer from update anomalies.

For Example, if we want to update the name of an owner, such as Tony Shaw (ownerNo CO93), we have to update two tuples in the PropertyOwner relation.

• This update anomaly is caused by a transitive dependency.

• The normalization of 2NF relations to 3NF involves the removal of transitive dependencies.

• If a transitive dependency exists, we remove the transitively dependent attribute(s) from the relation by placing the attribute(s) in a new relation along with a copy of the determinant.

Page 35: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 35/48

Third Normal Form (3NF) is a relation that is in First and Second Normal Form, and in which no non-primary key attribute is transitively dependent on the primary key. Transitive dependency is a condition where A, B, and C are attributes of a relation such that if A → B and B → C, then C is transitively dependent on A via B.

Please Note:

Page 36: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 36/48

Third Normal Form (3NF) cont’d• In our previous example, the Client and Rental relations have no

transitive dependencies and are therefore already in 3NF.

• However, in the PropertyOwner relation, all the non-primary-key attributes within the PropertyOwner relation are functionally dependent on the primary key, with the exception of oName, which is transitively dependent on ownerNo (represented as fd4).

• To transform the PropertyOwner relation into 3NF we must first remove this transitive dependency by creating two new relations called PropertyForRent and Owner in the form:

• PropertyForRent (propertyNo, pAddress, rent, ownerNo)• Owner (ownerNo, oName)

Page 37: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 37/48

2NF to 3NF illustration

Stage 5: Four Tables in 3NF

Stage 4: Three Tables in 2NF

Page 38: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 38/48

Lossless-join decomposition• ClientRental 1NF relation has been decomposed

into four 3NF relations.

• The normalization process has decomposed the original ClientRental relation using a series of relational algebra projections. This results in a lossless-join decomposition, which is reversible using the natural join operation.

Page 39: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 39/48

General Definitions of 2NF and 3NF

• Earlier 2NF and 3NF definitions do not take into account other candidate keys of a relation, if any exist.

• Here below, we present more general definitions that take into account candidate keys of a relation.

Page 40: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 40/48

Higher Normal Forms• We have described the three most commonly used

normal forms: 1NF, 2NF, and 3NF. • However, R. Boyce and E.F. Codd identified a

weakness with 3NF and introduced a stronger definition of 3NF called Boyce–Codd Normal Form (BCNF).

• Higher normal forms that go beyond BCNF were introduced later, such as Fourth (4NF) and Fifth (5NF) Normal Forms (Fagin, 1977, 1979).

• However, these later normal forms deal with situations that are very rare and so we briefly look at only BCNF.

Page 41: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 41/48

Boyce–Codd Normal Form (BCNF)• We have seen that 2NF and 3NF disallow partial and

transitive dependencies on the primary key of a relation, respectively.

• However, the 2NF and 3NF do not consider whether such dependencies remain on other candidate keys of a relation, if any exist.

• Additional redundancy caused by dependencies that violate one or more candidate keys is a weakness in 3NF relations. This weakness resulted in the presentation of a stronger normal form called Boyce–Codd Normal Form (Codd, 1974).

Page 42: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 42/48

Boyce–Codd Normal Form (BCNF) is a relation in which every determinant is a candidate key.

Please Note:

Page 43: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 43/48

PRACTICE EXERCISES1. Given the following data source, use normalization as a bottom-up technique to create a suitable set of relations in 3NF.

Page 44: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 44/48

Solution - Exercise One• 0NF

– ORDER(order#, customer#, name, street, city, country, orderdate, (product#, description, quantity, unitprice))

• 1NF – ORDER(order#, customer#, name, street,city,country, orderdate) – PRODUCT_ORDER(order#, product#, description, quantity, unitprice)

• 2NF – ORDER(order#, customer#, name, street, city, country, orderdate) – PRODUCT_ORDER(order#, product#, quantity) – PRODUCT(product#, description, unitprice)

• 3NF – ORDER(order#, customer#, orderdate) – CUSTOMER(customer#, name, street, city, country) – PRODUCT_ORDER(order#, product#, quantity) – PRODUCT(product#, description, unitprice)

Page 45: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 45/48

PRACTICE EXERCISES cont’d2. Using the given functional dependencies, normalize the EMPLOYEE_CONTRACT relation fully.

• EMPLOYEE_CONTRACT (staffNo, contractNo, hours, staffName, branchNo, branchName)

• fd1: staffNo, contractNo -> hours, staffName, branchNo, branchName

• fd2: staffNo -> staffName • fd3: contractNo -> branchNo, branchName• Fd4: branchNo -> branchName

Page 46: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 46/48

PRACTICE EXERCISES cont’d3 (a) The following table is susceptible to update anomalies. Provide examples of insertion, deletion, and modification anomalies.

(b) Using the primary key and functional dependency concepts, normalize the following tables below to the 3NF

Page 47: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 47/48

Review Questions1. Describe the purpose of normalizing data.2. Discuss the alternative ways that normalization can be used to support database design.3. Describe the types of update anomaly that may occur on a relation that has redundant data.4. Describe the concept of functional dependency.5. What are the main characteristics of functional dependencies that are used for normalization?6. Describe how a database designer typically identifies the set of functional dependencies

associated with a relation.7. Describe the characteristics of a table in Unnormalized Form (UNF) and describe how such a

table is converted to a First Normal Form (1NF) relation.8. What is the minimal normal form that a relation must satisfy? Provide a definition for this

normal form.9. Describe the two approaches to converting an Unnormalized Form (UNF) table to First

Normal Form (1NF) relation(s).10. Describe the concept of full functional dependency and describe how this concept relates to

2NF. Provide an example to illustrate your answer.11. Describe the concept of transitive dependency and describe how this concept relates to 3NF.12. Discuss how the definitions of 2NF and 3NF based on primary keys differ from the general

definitions of 2NF and 3NF.

Page 48: NORMALIZATION - BIS 1204: Data and Information Management I

BIS 1204: Data and Information Management IEvening ‘Group 2’ Presentation:

Slide 48/48