15
Chapter 8: Relational Database Chapter 8: Relational Database Design Design

Normalization

Embed Size (px)

DESCRIPTION

CSCS- 433 Database Management System course content

Citation preview

Page 1: Normalization

Chapter 8: Relational Database DesignChapter 8: Relational Database Design

Page 2: Normalization

Combine Schemas?Combine Schemas?

Suppose we combine instructor and department into inst_dept

(No connection to relationship set inst_dept)

Result is possible repetition of information

Page 3: Normalization

NormalizationNormalization

Database Normalisation is a technique of organizing the data in the database. Normalization is a systematic approach of decomposing tables to eliminate data redundancy and undesirable characteristics like Insertion, Update and Deletion Anomalies.

It is a multi-step process that puts data into tabular form by removing duplicated data from the relation tables.

Normalization is used for mainly two purpose,

Eliminating reduntant(useless) data.

Ensuring data dependencies make sense i.e data is logically stored.

Page 4: Normalization

Problem Without NormalizationProblem Without Normalization

Without Normalization, it becomes difficult to handle and update the database, without facing data loss. Insertion, Updation and Deletion Anomalies are very frequent if database is not Normalized.

To understand these anomalies let us take an example of Student table.

Page 5: Normalization

Problem Without NormalizationProblem Without Normalization

Updating Anomaly : To update address of a student who occurs twice or more than twice in a table, we will have to update S_Address column in all the rows, else data will become inconsistent.

Insertion Anomaly : Suppose for a new admission, we have a Student id(S_id), name and address of a student but if student has not opted for any subjects yet then we have to insert NULL there, leading to Insertion Anamoly.

Deletion Anomaly : If (S_id) 401 has only one subject and temporarily he drops it, when we delete that row, entire student record will be deleted along with it.

Page 6: Normalization

Normalization TechniquesNormalization Techniques

Normalization rule are divided into following normal form.

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

BCNF

Atomic: Domain is atomic if its elements are considered to be indivisible units

Examples of non-atomic domains: Set of names, composite attributesIdentification numbers like CS101 that can be broken up into parts.

Non-atomic values complicate storage and encourage redundant (repeated) storage of data. Example: Set of accounts stored with each customer, and set of owners stored with each account

Page 7: Normalization

First Normal Form (Cont.)First Normal Form (Cont.)

A relational schema R is in first normal form if the domains of all attributes of R are atomic.

As per First Normal Form, no two Rows of data must contain repeating group of information i.e each set of column must have a unique value, such that multiple columns cannot be used to fetch the same row.

Each table should be organized into rows, and each row should have a primary key that distinguishes it as unique.

The Primary key is usually a single column, but sometimes more than one column can be combined to create a single primary key.

Page 8: Normalization

For example consider a table which is not in First normal form.

In First Normal Form, any row must not have a column in which more than one value is saved, like separated with commas. Rather than that, we must separate such data into multiple rows.

First Normal Form (Cont.)First Normal Form (Cont.)

Page 9: Normalization

Using the First Normal Form, data redundancy increases, as there will be many columns with same data in multiple rows but each row as a whole will be unique.

First Normal Form (Cont.)First Normal Form (Cont.)

Page 10: Normalization

Remove subsets of data that apply to multiple rows of a table and place them in separate tables.

Create relationships between these new tables and their predecessors through the use of foreign keys.

Although there are a few complex cases in which table in Second Normal Form suffers Update Anomalies, and to handle those scenarios Third Normal Form is there.

Second Normal FormSecond Normal Form

First Name

Las t Name

Address City State Zip

Lisa Hestings Bertha Street

Miami FL 33157

Adam Gabriel Fleming Street

Miami FL 33157

Lucy Herts Bridge Road

NY Sea Cliff 11579

Page 11: Normalization

Second Normal FormSecond Normal Form

A brief look at this table reveals a small amount of redundant data. We're storing the "Sea Cliff, NY 11579" and "Miami, FL 33157" entries twice each.

Additionally, if the ZIP code for FL were to change, we'd need to make that change in many places throughout the database.

In a 2NF-compliant database structure, this redundant information is extracted and stored in a separate table. Our new table (let's call it ZIPs) might have the following columns-

We’ll need to use a foreign key to tie the two tables together. We'll use the ZIP code (the primary key from the ZIPs table) to create that relationship. Here's our new Customers table:

Zip City State

First Name Las t Name Address Zip

Page 12: Normalization

Third Normal form applies that every non-prime attribute of table must be dependent on primary key.

The transitive functional dependency should be removed from the table. The table must be in Second Normal form. For example, consider a table with following fields.

Now, are all of the columns fully dependent upon the primary key?

The customer number varies with the order number and it doesn't appear to depend upon any of the other fields. 

It appears sometimes charge the same customer different prices. The quantity of items also varies from order to order. So, the unit price and quamtity is fully dependent upon the order number.

Third Normal FormThird Normal Form

Order No Customer No Unit Price Quantity Total

123J09 NY65031 500 $ 2 1000 $

120J11 ST90452 300 $ 1 300 $

123J09 NY65031 100 $ 4 400 $

Page 13: Normalization

Third Normal FormThird Normal Form What about the total?

The total can be derived by multiplying the unit price by the quantity, therefore it's not fully dependent upon the primary key. We must remove it from the table to comply with the third normal form. 

Order No Customer No Price

Price Unit Price Quantity Total

Page 14: Normalization

Boyce-Codd Normal FormBoyce-Codd Normal Form

Boyce and Codd Normal Form is a higher version of the Third Normal form.

This form deals with certain type of anomaly that is not handled by 3NF.

A 3NF table which does not have multiple overlapping candidate keys is said to be in BCNF.

Page 15: Normalization

Thank youThank you