37
September 24, 20 01 91.2914 R McFadyen 1 The objective of normalization is sometimes stated “to create relations where every dependency is on •the primary key, •the whole primary key, and •nothing but the primary key” Database designers are always looking for (OLTP) databases that are as simple as possible - ones that are easiest to keep consistent, ones where the semantics are clear. Normalization

September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

  • View
    233

  • Download
    0

Embed Size (px)

Citation preview

Page 1: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 1

The objective of normalization is sometimes stated

“to create relations where every dependency is on

•the primary key,

•the whole primary key, and

•nothing but the primary key”

Database designers are always looking for (OLTP) databases that are as simple as possible - ones that are easiest to keep consistent, ones where the semantics are clear.

Normalization

Page 2: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 2

Certain relation schemas have update anomalies

- they may be difficult to understand and maintain

Normalization theory recognizes this and gives us some principles to guide our designs

Normal Forms: 1NF, 2NF, 3NF, BCNF, 4NF, … are each an improvement on the previous ones in the list

Normalization

Page 3: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 3

Normalization is a process (involves decomposition) that generates tables of higher normal forms.

Denormalization moves from higher to lower forms and is done for performance reasons.

Normalization

Page 4: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 4

Suppose we have

EmployeeProject

Anomalies

EmployeeProject holds information about employees and the projects they work on

•emp_num: employee’s social insurance number•proj_num: a project number•ep_hours: the hours the employee has worked on the project•emp_name: the employee’s name•proj_loc: the location of the project •PK is {emp_num, proj_num}

emp_num proj_num ep_hours emp_name proj_loc

Page 5: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 5

An instance of the table:

emp_num proj_num ep_hours emp_name

EmployeeProject

proj_loc

Anomalies

11 23 6 Jones Edmt

11 36 2 Jones Wpg

11 99 22 Jones Brandon

17 23 5 Smith Edmt

17 36 2 Smith Wpg

17 101 13 Smith Wpg

Page 6: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 6

EmployeeProject

Anomalies

we will have redundant information in the database …

•if more than one employee works on the same project, then the project location is repeated

•if some employee works on more than one project, then the employee’s name is repeated

emp_num proj_num ep_hours emp_name proj_loc

Page 7: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 7

Redundant data leads to

•additional space requirements

•update anomalies

Anomalies

Page 8: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 8

Suppose EmployeeProject is the only relation where the Project Location is recorded

insert anomalies

•adding a new project is complicated, unless there is also an employee for that project

deletion anomalies

•if we delete all employees for some project, then what should happen to the project information?

modification anomalies

•if we update the location of a project, then we must change it in all rows referring to that project

Anomalies

Page 9: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 9

If we design a database with a relation such as EmployeeProject then we will have complex update rules to enforce.

•difficult to code correctly

•will not be as efficient as possible

Such designs mix concepts.

E.g EmployeeProject mixes the Employee and Project concepts

Such designs are (generally) not good for OLTP

Anomalies

Page 10: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 10

Suppose we have a relation R comprising attributes X,Y, …

We say a functional dependency exists between the attributes X and Y,

if, whenever a tuple exists with the value x for X, it will always have the same value for y for Y.

emp_num emp_name

emp_gender

emp_phone

X Y

Employee

Given a specific employee number, there is only one value for name, one value for gender, and only one value for phone

Emp_number is a determinant for emp_name, emp_gender, emp_phone

Functional dependencies

emp_num emp_gender emp_name emp_phone

Page 11: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 11

•a series of normal forms are known that have, successively, better update characteristics

•we’ll consider 1NF, 2NF, 3NF, and BCNF

•a technique used to improve a relation is decomposition, where one relation is replaced by two or more relations. When we do so, we want to eliminate update anomalies without losing any information.

•Our target is BCNF

Normal Forms

Page 12: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 12

The Dependency Diagram

� illustrates a single-attribute PK (simple PK)

� all employee attributes are dependent on the PK

Employee emp_num emp_gender emp_name emp_phone

Page 13: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 13

Partial Dependency

PROJ_CODE EMP_NUM PROJ_NAME HRS_WORKED

� Multi-attribute PK (composite PK)� HRS_WORKED is dependent on the PK� But PROJ_CODE, which is only a part of the PK, determines PROJ_NAME (equivalently, PROJ_NAME is dependent on PROJ_CODE)

Partial dependency

Page 14: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 14

Transitive Dependency

STU_NUM STU_LNAME DEPT_CODE DEPT_NAME

� All student attributes are dependent on the PK� But DEPT_CODE determines DEPT_NAME (or

DEPT_NAME is dependent on DEPT_CODE, a non-key attribute

Transitive dependency

Page 15: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 15

BCNF

•all determinants are candidate keys

INV_NUM LINE_NUM PROD_CODE PROD_TITLE CUS_NUM LINE_UNITS

Example, consider:

Partial dependency

Transitive dependency

Page 16: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 16

BCNF - all determinants are candidate keys

INV_NUM LINE_NUM PROD_CODE PROD_TITLE CUS_NUM LINE_UNITS

Determinants:

prod_code prod_title

inv_num cus_num

inv_num, prod_code line_units

Three determinants, but only one is a candidate key. Therefore, not in BCNF

Page 17: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 17

BCNF - all determinants are candidate keys

INV_NUM LINE_NUM PROD_CODE PROD_TITLE CUS_NUM LINE_UNITS

Three determinants:

•prod_code

•inv_num

•{prod_code, line_units}

but only one is a candidate key.

Therefore, not in BCNF

A table with these

attributes will have a lot

of redundancy.

Page 18: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 18

Decomposition

We can decompose the single table into three tables where there will be

no unnecessary redundancy

no loss of information - we can join the three to have what we had before

no loss of dependencies

INV_NUM LINE_NUM PROD_CODE

PROD_TITLE

LINE_UNITS

INV_NUM CUS_NUM PROD_CODE

Page 19: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 19

Decomposition

Each of these tables is in BCNF

INV_NUM LINE_NUM PROD_CODE

PROD_TITLE

LINE_UNITS

INV_NUM CUS_NUM

PROD_CODE

Page 20: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 20

INV_NUM LINE_NUM PROD_CODE PROD_TITLE CUS_NUM LINE_UNITS

Consider:

This table is not in 2NF because it has a partial dependency: CUS_NUM is dependent on INV_NUM

Partial dependency

Page 21: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 21

First Normal Form (1NF)

Each row/column intersection contains one and only one value, rather than a set of values

All attributes are dependent on the primary key

Page 22: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 22

EMP_NUM EMP_LNAME EMP_FNAME EMP_DOB

This table is in 1NF.

Question to answer later: is it in 2NF? 3NF? BCNF?

First Normal Form (1NF)

Consider

Page 23: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 23

First Normal Form (1NF)

INV_NUM LINE_NUM PROD_CODE PROD_TITLE CUS_NUM LINE_UNITS

This table is in 1NF.

Question to answer later: is it in 2NF? 3NF? BCNF?

Consider

Page 24: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 24

The Second Normal Form (2NF)

� Meets 1NF requirements

� Does not contain partial dependencies

� But may contain transitive dependencies

Page 25: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 25

Decomposition

A table can be decomposed into two or more tables. Ideally, this involves:

•no loss of information•all information previously available can be obtained by joining the new tables - a lossless decomposition

•no loss of dependencies•a dependency-preserving decomposition

Page 26: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 26

INV_NUM LINE_NUM PROD_CODE PROD_TITLE CUS_NUM LINE_UNITS

Consider:

This table is not in 2NF because it has a partial dependency: CUS_NUM is dependent on INV_NUM

Partial dependency

Page 27: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 27

INV_NUM LINE_NUM PROD_CODE PROD_TITLE CUS_NUM LINE_UNITS

Consider:

We need to decompose the table - move the partial dependency to a new table. I.e. Invoice information belongs elsewhere.

Partial dependency

Page 28: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 28

Decomposition

create a new table

don’t lose any information (can still derive)

INV_NUM LINE_NUM PROD_CODE PROD_TITLE LINE_UNITS

INV_NUM CUS_NUM

No partial

dependencies

present!

Page 29: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 29

The Third Normal Form (3NF)

� Meets 2NF requirements

� Does not contain transitive dependencies

Page 30: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 30

Consider:

Because of the transitive dependency, this table is not in 3NF

Transitive dependency

INV_NUM LINE_NUM PROD_CODE PROD_TITLE LINE_UNITS

We need to decompose the table - move the transitive dependency to a new table. I.e. Product information belongs elsewhere.

Page 31: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 31

INV_NUM LINE_NUM PROD_CODE LINE_UNITS

Decomposition

create a new table

don’t lose any information (can still derive)

There is no transitive

dependency here!PROD_CODE PROD_TITLE

Page 32: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 32

The Boyce-Codd Normal Form (BCNF)

� Meets 3NF requirements� Every determinant in the table is a candidate key� Focus is on determinants

Page 33: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 33

Example

consider the following functional dependencies and table structure

city, streetname postalcode

postalcode city

city streetname postalcode

Note that this table does have redundant data, and from a theoretical perspective, would have anomalies associated with it.

Address

Page 34: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 34

Example

city, streetname postalcode

postalcode city

city streetname postalcode

Let us consider the more formal/complete definitions in the 91.3902 text, and then ask:

What are the non-key attributes?

What is the primary key?

What is the normal form of Address?

Address

Page 35: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 35

From the 91.3902 text: A relation R is in 2NF if every nonprime attribute of A is not partially dependent on any key R.

nonprime: an attribute is nonprime if it is not a member of a candidate key.prime: an attribute is prime if it is a member of a candidate key.

city streetname postalcodeAddress

In our example, there are no nonprime attributes

Hence the table is in 2NF

Page 36: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 36

From the 91.3902 text: A relation R is in 3NF if, whenever a functional dependency x --> A holds in R, either

a) X is a superkey of R, orb) A is a prime attribute of R.

One of the things that this says is: if a nonprime attribute A is dependent on some attribute, that determinant must include the key.

city streetname postalcodeAddress

In our example, there are no nonprime attributesHence the table is in 3NF

Page 37: September 24, 200191.2914 R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary

September 24, 2001 91.2914 R McFadyen 37

Is every determinant a candidate key?

NO

city streetname postalcodeAddress

Hence the table is not BCNF

What decomposition would preserve dependencies and have BCNF tables? Is this a practical example?