35
1 5 Normalization

1 5 Normalization. 2 5 Database Design Give some body of data to be represented in a database, how do we decide on a suitable logical structure for that

Embed Size (px)

Citation preview

1

5

Normalization

2

5

Database Design

Give some body of data to be represented in a database, how do we decide on a suitable logical structure for that data?

We are concerned here with logical (or conceptual) design only, not physical design

Database design is still very much an art, not a science

Database design is not just a question of getting the data structure right – data integrity is a (perhaps the) key ingredient too.

We will be concerned for the most part with what might be termed application independent design.

Page 2

3

5

Database Normalization

Database normalization is the process of removing redundant data from your tables in to improve storage efficiency, data integrity, and scalability.

In the relational model, methods exist for quantifying how efficient a database is. These classifications are called normal forms (or NF), and there are algorithms for converting a given database between them.

Normalization generally involves splitting existing tables into multiple ones, which must be re-joined or linked each time a query is issued.

4

5History

Edgar F. Codd first proposed the process of normalization and what came to be known as the 1st normal form in his paper A Relational Model of Data for Large Shared Data Banks Codd stated:“There is, in fact, a very simple elimination procedure which we shall call normalization. Through decomposition nonsimple domains are replaced by ‘domains whose elements are atomic (nondecomposable) values.’”

5

5

Normalized Design: Pros and Cons

Pros of Normalizing Cons of Normalizing

•More efficient database structure. •Better understanding of your data. •More flexible database structure. •Easier to maintain database structure. •Few (if any) costly surprises down the road. •Validates your common sense and intuition.

•Avoids redundant fields. • Ensures that distinct tables exist when necessary.

You can't start building the database before you know what the user needs.

6

5

Functional Dependencies

Informal definition:

A many-to-one relationship between one set of attributes A and another set of attributes B in a given relation R

i.e. for many values of the set A, there is only one value in set B.functional dependencies (FDs) tell us the meaning of data (e.g. every supplier is located in only one city …)FDs represent integrity constraints.FDs are checked by the database management system (DBMS) at

every update.So we are interested in finding the smallest set of FDs that capture

the intended meaning of the data.

Page 6

7

5

Definitions

A more formal definition:

Given R, an instance of a relation, and X and Y, arbitrary attribute subsets of R, then Y is functionally dependent on X:

X Y

if and only if each X-value in R is associated with precisely one Y-value in R

Page 7

8

5

Use of Functional Dependencies

We use functional dependencies to:

test relations to see if they are legal under a given set of functional dependencies. If a relation r is legal under a set S of functional dependencies, we say that r satisfies S.

specify constraints on the set of legal relations; we say that S holds on R if all legal relations on R satisfy the set of functional dependencies S.

Note: A specific instance of a relation schema may satisfy a functional dependency even if the functional dependency does not hold on all legal instances.

Page 8

9

5

Dependencies: Definitions

Partial Dependency

A partial dependency is a dependency where A is functionally dependant on B ( A → B), but there is some attribute on A that can be removed from A and yet the dependency stills holds. For instance if the relation existed StaffNo, sName → branchNo Then you could say that for every StaffNo, sName there is only one value of branchNo, but since there is no relation between branchNo and staffNo the relation is only partial.

10

5

Partial Dependency – when an non-key attribute is determined by a part, but not the whole, of a COMPOSITE primary key.

CUSTOMER

Cust_ID Name Order_ID

101 AT&T 1234

101 AT&T 156

125 Cisco 1250

Partial Dependency

11

5

Dependencies: Definitions

Transitive Dependency – In a transitive dependancy is where A → B and B → C, therefore A → C (provided that B → A, and C → A doesn't exist).

EMPLOYEE

Emp_ID F_Name L_Name Dept_ID Dept_Name

111 Mary Jones 1 Acct

122 Sarah Smith 2 Mktg

Transitive Dependency

12

5

Normal Forms: Review

Unnormalized – There are multivalued attributes or repeating groups

1 NF – No multivalued attributes or repeating groups.

2 NF – 1 NF plus no partial dependencies

3 NF – 2 NF plus no transitive dependencies

13

5

Example 1: Determine NF

ISBN Title ISBN Publisher Publisher Address

BOOK

ISBN Title Publisher Address

All attributes are directly or indirectly determined

by the primary key; therefore, the relation is

at least in 1 NF

14

5

Example 1: Determine NF

ISBN Title ISBN Publisher Publisher Address

BOOK

ISBN Title Publisher Address

The relation is at least in 1NF. There is no COMPOSITE

primary key, therefore there can’t be partial dependencies.

Therefore, the relation is at least in 2NF

15

5

Example 1: Determine NF

ISBN Title ISBN Publisher Publisher Address

BOOK

ISBN Title Publisher Address

Publisher is a non-key attribute, and it determines Address, another non-key attribute.

Therefore, there is a transitive dependency, which means that

the relation is NOT in 3 NF.

16

5

Example 1: Determine NF

ISBN Title ISBN Publisher Publisher Address

BOOK

ISBN Title Publisher Address

We know that the relation is at least in 2NF, and it is not in 3 NF. Therefore, we conclude that the relation is in 2NF.

17

5

Example 1: Determine NF

ISBN Title ISBN Publisher Publisher Address

BOOK

ISBN Title Publisher Address

In your solution you will write the following justification:

1) No M/V attributes, therefore at least 1NF

2) No partial dependencies, therefore at least 2NF

3) There is a transitive dependency (Publisher Address), therefore,

not 3NFConclusion: The relation is in 2NF

18

5

Product_ID Description

ORDER

Order_No Product_ID Description

Example 2: Determine NF

All attributes are directly or indirectly determined by the

primary key; therefore, the relation is at least in 1 NF

19

5

Product_ID Description

Example 2: Determine NF

ORDER

Order_No Product_ID Description

The relation is at least in 1NF. There is a COMPOSITE Primary Key (PK)

(Order_No, Product_ID), therefore there can be partial dependencies. Product_ID, which is a part

of PK, determines Description; hence, there is a partial dependency. Therefore, the relation is not

2NF. No sense to check for transitive dependencies!

20

5

Product_ID Description

Example 2: Determine NF

ORDER

Order_No Product_ID Description

We know that the relation is at least in 1NF, and it is not in 2 NF.

Therefore, we conclude that the relation is in 1 NF.

21

5

Product_ID Description

Example 2: Determine NF

ORDER

Order_No Product_ID Description

In your solution you will write the following justification:

1) No M/V attributes, therefore at least 1NF

2) There is a partial dependency (Product_ID Description), therefore

not in 2NFConclusion: The relation is in 1NF

22

5

PART

Part_ID Descr Price Comp_ID No

Example 3: Determine NF

Part_ID Description Part_ID Price Part_ID, Comp_ID No

Comp_ID and No are not determined by the primary key; therefore, the relation is NOT in 1 NF. No sense

in looking at partial or transitive dependencies.

23

5

Example 3: Determine NF

Part_ID Description Part_ID Price Part_ID, Comp_ID No

PART

Part_ID Descr Price Comp_ID No

In your solution you will write the following justification:

1) There are M/V attributes; therefore, not 1NF

Conclusion: The relation is not normalized.

24

5

Bringing a Relation to 1NF

STUDENT

Stud_ID Name Course_ID Units

101 Lennon MSI 250 3.00

101 Lennon MSI 415 3.00

125 Johnson MSI 331 3.00

25

5

Bringing a Relation to 1NF

STUDENT

Stud_ID Name Course_ID Units

101 Lennon MSI 250 3.00

101 Lennon MSI 415 3.00

125 Johnson MSI 331 3.00

Option 1: Make a determinant of the repeating group (or the multivalued attribute) a part of the primary key.

Composite Primary Key

26

5

Bringing a Relation to 1NF Option 2: Remove the entire repeating group from

the relation. Create another relation which would contain all the attributes of the repeating group, plus the primary key from the first relation. In this new relation, the primary key from the original relation and the determinant of the repeating group will comprise a primary key. STUDENT

Stud_ID Name Course_ID Units

101 Lennon MSI 250 3.00

101 Lennon MSI 415 3.00

125 Johnson MSI 331 3.00

27

5

Bringing a Relation to 1NF

STUDENT_COURSE

Stud_ID Course Units

101 MSI 250 3

101 MSI 415 3

125 MSI 331 3

STUDENT

Stud_ID Name

101 Lennon

125 Jonson

28

5

Bringing a Relation to 2NF

STUDENT

Stud_ID Name Course_ID Units

101 Lennon MSI 250 3.00

101 Lennon MSI 415 3.00

125 Johnson MSI 331 3.00

Composite Primary Key

29

5

Bringing a Relation to 2NF

STUDENT

Stud_ID Name Course_ID Units

101 Lennon MSI 250 3.00

101 Lennon MSI 415 3.00

125 Johnson MSI 331 3.00

Composite Primary Key

Goal: Remove Partial DependenciesPartial

Dependencies

30

5

Bringing a Relation to 2NF

STUDENT

Stud_ID Name Course_ID Units

101 Lennon MSI 250 3.00

101 Lennon MSI 415 3.00

125 Johnson MSI 331 3.00

Remove attributes that are dependent from the part but not the whole of the primary key from the original relation. For each partial dependency, create a new relation, with the corresponding part of the primary key from the original as the primary key.

31

5Bringing a Relation to 2NF

CUSTOMER

Stud_ID Name Course_ID Units

101 Lennon MSI 250 3.00

101 Lennon MSI 415 3.00

125 Johnson MSI 331 3.00

STUDENT_COURSE

Stud_ID Course_ID

101 MSI 250

101 MSI 415

125 MSI 331

COURSE

Course_ID Units

MSI 250 3.00

MSI 415 3.00

MSI 331 3.00

STUDENT

Stud_ID Name

101 Lennon

101 Lennon

125 Johnson

32

5Bringing a Relation to 3NF

Goal: Get rid of transitive dependencies.

EMPLOYEE

Emp_ID F_Name L_Name Dept_ID Dept_Name

111 Mary Jones 1 Acct

122 Sarah Smith 2 Mktg

Transitive Dependency

33

5Bringing a Relation to 3NF Remove the attributes, which are dependent on a

non-key attribute, from the original relation. For each transitive dependency, create a new relation with the non-key attribute which is a determinant in the transitive dependency as a primary key, and the dependent non-key attribute as a dependent.

EMPLOYEE

Emp_ID F_Name L_Name Dept_ID Dept_Name

111 Mary Jones 1 Acct

122 Sarah Smith 2 Mktg

34

5Bringing a Relation to 3NF

EMPLOYEE

Emp_ID F_Name L_Name Dept_ID

111 Mary Jones 1

122 Sarah Smith 2

EMPLOYEE

Emp_ID F_Name L_Name Dept_ID Dept_Name

111 Mary Jones 1 Acct

122 Sarah Smith 2 Mktg

DEPARTMENT

Dept_ID Dept_Name

1 Acct

2 Mktg

35

5Boyce-Codd Normal Form (BCNF)

Definition: A relation is in Boyce-Codd Normal Form (BCNF) if every determinant is a candidate key, where as candidate key is a column in a table which has the ability to become a primary key.