45
Normal Forms First Normal Form (1NF) Objectives of 1NF - The schema of an unorganized relation gives no clues to which attributes can have multiple values. - Semantics of a 1NF are more explicit. - The relational operator are applicable only on flat that is 1NF relations. • Problems: All the anomalies discussed previously • Update Anomalies •Deletion Anomalies •Insertion Anomalies

Normalization Theory 3 - University of Cypruscs.ucy.ac.cy/~epl242/lectures/Normalization_Theory_3.pdf · •The dependency A ÆB is called trivial multivalued dependency if B is a

  • Upload
    buimien

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

Normal FormsFirst Normal Form (1NF)• Objectives of 1NF

- The schema of an unorganized relation gives no clues to which attributes can have multiple values.

- Semantics of a 1NF are more explicit.- The relational operator are applicable only on flat that is

1NF relations.• Problems:

All the anomalies discussed previously• Update Anomalies•Deletion Anomalies•Insertion Anomalies

Full Functional DependencyDefinition:

In a relation R attribute B of R is “fully functional dependent” on an attribute or set of attribute A of R if B is functional dependent on A but not functional dependent on any proper subset of A.Example:Lets consider the following relation

FD’s:

Second Normal Form (2NF)Definition:

A relation is in second normal form (2NF) if and only if it in 1NF and all nonkey attributes are fully dependent on the key.

Clearly if the relation is in 1NF and the key consists of a single attribute the relation is automatically 2NF.

For example the previous relation is not in 2NF.

The class relation is not in 2NF.

• Intuitively a relation might not be in 2NF if is trying to describe information for more than one entity. (i,e through a many-many relationship)- In the previous example: student and the course entity.- In the EMP_PROJ: Information about Projects and Employees.

Transforming a 1NF relation to 2NF relation.

1. Identify each nonfull functional dependency.2. Form projections by removing the attributes that depend

on each of the determinants so identified.3. Place these determinants in separate relations along

with their dependent attributes.4. The original relation still contains the composite key

and any attributes that are fully functional dependent on it.

This type of projections are called “lossless projection”because the original relation can be reconstructed by takingthe natural join of the resulting projections.

Example:

FD’s:

2NF

• Instance of the previous relations.

Example:

Objectives of 2NF:

• The semantics of a 2NF are more explicit: all the attributes are dependent on the entire primary key.

• Database designed with 2NF relations avoid undesirable update anomalies present in 1NF relations.

• The schema of a 1NF relation gives no glue to which attributes are dependent on which other attributes.- Knowing that a relation in 2NF means that no attribute is dependent on only part of the key.

REVIEW:• Transitive Dependency (3rd Amstrong’s axiom)

- Lets consider the following relation

• FD’s:

• The STUID functionally determines STATUS in two ways: Directly and Transitively through CREDITS.

• So the attribute STATUS is said to be transitively dependent on the attribute STUID.

Third Normal Form (3NF)Definition:A relation is in third normal form (3NF) if and only if

- it is in 2NF and- no nonkey attribute is “transitively dependent” on the key.

Example:- the following relation is in 2NF but not in 3NF.

- because the nonkey attribute STATUS is transitively dependent on the key, STUID.

• Clearly a 2NF relation with one nonkey attribute must always be a 3NF relation.

• Transforming a 2NF relation to 3NF relation.1. We look to see if any nonkey attribute is functionally

dependent on another nonkey attribute.2. Remove the functionally dependent attribute from the

relation placing it in a new relation with its determinant.3. The determinant can remain in the original relation.

• Example:

3NF

Example:

• Intuitively we see that ED1 and ED2 represent independent entity facts about employees and departments.• The NATURAL JOIN operation on ED1 and ED2 will recover the original relation EMP_DEPT without generating spurious tuples.

Objectives of 3NF:

• The semantics of a 3NF are more explicit: all the attributes are dependent ONLY on the primary key.

• Database designed with 3NF relations avoid undesirable update anomalies present in 2NF relations.

• The schema of a 2NF relation gives no glue to which nonkey attributes are dependent on which other nonkey attributes.- Knowing that a relation in 3NF means that no nonkey attribute is dependent on only part of the key.

BOYCE-CODD Normal form (BCNF)• The definition of 3NF for relations that have a

single candidate key.• It was found to have be deficient in in cases were

there are:- multiple candidate keys- composite candidate keys

Example:

Constrains:1. No two faculty members within a single department have the

same name.2. Each faculty member have only one office.3. A department may have several faculty offices.4. Faculty members from the same department may share

offices.Resulting FDs:

- 2NF?- 3NF?

BOYCE-CODD Normal form (BCNF)Definition:

A relation is in Boyce-Codd normal form (BCNF) if and only if

- every determinant is a candidate key

Q: Is the previous relation in BCNF?- NO because OFFICE is not a candidate key.

A relational schema in BCNF is:

Objectives of BCNF:

• The semantics of multiple candidate keys are more explicit: all the attributes are dependent ONLY on the candidate key.

• Database designed with BCNF relations avoid undesirable update anomalies present in 3NF relations.

• In previous example:We can not delete a faculity member from a department without loosing information about an office (assuming he is the only occupant).- That is because OFFICE is not a candidate key.

Example of Functional Dependencies and Normal Forms.

• Consider the following universal relation that stores information about projects in a large business.

Semantics:1. Each project has unique name but names of

employees and managers are not unique.2. Each project has one manager whose name is stored

in PROJMGR.

Semantics:

3. Many employees may be assigned to work on each project and an employee may be assigned to more than one project.

4. HOURS tells the number of hours per week that a particular employee is assigned to work on a particular project.

5. BUDGET stores the amount budgeted for a project and STARTDATE gives the starting date for the project.

6. SALARY gives the annual salary of an employee.

Semantics:

7. EMPMGR give the name of the employee’s manager who is not the same as the project manager.

8. EMPDEPT gives the employee;s department. Department name is unique. The employee;s manager is tha manager of the employee’s department.

9. RATING gives the employee’s rating for a particular project. The project manager assigns the rating at the end of the employee’s work on the project.

SolutionFD’s:

NORMAL FORMS:1NF? With our composite key, which cell will be single value so WORK is 1NF.2NF? NO because we have the following partial dependency.

We transform the relation into an equivalent set of 2NF relations by projection,resulting:

3NF?PROJ and WORK1 are in 3NF but EMP is not becausewe have a transitive dependency:

Our new set of 3NF relations is therefore

BCNF?Yes since in each relation the only determinant is theprimary key.

The Normalization Process• The process of finding stable set of relations that is a

faithful model of the enterprise.

• Decomposition (top-down process)- start with a universal relation- identify functional dependencies- use decomposition techniques to split the universal relation into a set of ones.The previous example was based on the decomposition approach.

Synthesis (bottom-up process)

• Begin with attributes and combine them into related group using functional dependencies to develop a set of normalized relations.

• A synthesis algorithm was developed by Bernstein.

• Basic steps:- make a list of all FDs- groups together those with the same determinant - construct a relation of each group.

Synthesis (bottom-up process)

Problems:1. Some FDs have more attributes in the determinant

than needed- We must eliminate extraneous attributes or 2NF relations might not result.

2. Eliminate redundant FDs before grouping 3NF will not result.

3. Two relations may appear to have different keys when in fact the keys are equivalent.

Synthesis (bottom-up process)

Improve Algorithm:1. Make a list of all FDs.2. Eliminate extraneous attributes in each FD.3. Remove any redundant FDs and find a non

redundant covering of the input FDs.- Combine FD groups with equivalent key.

4. Group together those with the same determinant.5. Construct a relation for each group.

Example: Consider the following set of FDs.

QUST: Using the Synthesis approach construct a set 3NFrelations.

Example: Consider the following set of FDs.

QUST: Using the Synthesis approach construct a set 3NFrelations.

QUST: Using the Synthesis approach construct a set 3NFrelations.

Example: Consider the following set of FDs.

Multivalue DependenciesConsider the following relation:

Assume that:1. A faculty member can belong to more than one

department.2. A faculty can belong to several college-wide

committees.3. There is no relation between department and

committee.Consider the following figure.

• The resulting relation is in BCNF but we still have update, insertion, deletion anomalies, i.e.– Update a committee that F101 belongs from Budget to Advancement.

• The faculty is not associated with only one department, is associated with a particular set of departments and a particularset of committees that are independent of each other.– This independence is the cause of the problem.

• Let R be a relation having attributes or sets of attributes A, B and C. There is a “multivalued dependence” of attribute B on attribute A if and only if:• The set of B values associated with a given A value is independent of the

C values.

• This definition makes the following true:– Consider two values of C:C1 and C2. The set of values of B in rows of

R with a given value of A and with C-value C1 must be exactly the same as the set of values of B in rows of R with that same A-value and with C-value C2.

• Unlike the rules for functional dependencies, which make certain tuples illegal, multivalue dependencies make certain tuples essential in a relation.

Definition:

Fourth Normal Form (4NF)Definition:A relation is in 4NF <==> it is in BCNF and there are no nontrivialmultivalued dependencies.•The dependency A B is called trivial multivalueddependency if B is a subset of A or A ∪ B = R.Example:The faculty relation is not in 4NF because of the nontrivialmultivalued dependencies:

4NF

Objectives of 4NF:• The semantics are more explicit:

- all dependencies are related.• Database designed with 4NF relations avoid undesirable

update anomalies present in 3NF.- In the previous example: We cannot drop a faculty member from a committee without loosing information about the faculty( assuming he belongs to only one committee).

• The schema of a BCNF relation gives no glue to whether there are multivalued dependencies among the primary key’s components not is it clear which components of the primary key are independent of one another.- Knowing that a relation is in 4NF means that no component of the key is independent of any other component.

Lossless DecompositionDefinition:

A decomposition of a relation R is a set of relations {R1, R2, …, Rn} such that:- each Ri is a subset of R ( Ri R )- the union of the Ri is R ( Ri = R )

Definition of “Lossless Decomposition”:A decomposition {R1, R2, …, Rn} of a relation R is called “Lossless Decomposition”: for R if the natural join of R1, R2, …, Rn produces exactly the relation R.

• Not every decomposition is lossless.

Example: Consider the relation

and the following decomposition which is not lossless

• We can guarantee that the decomposition is lossless if:– For each pair of relations that will be joined, the set of common attributes

is a determinant of one of the relations.

• We can do this by placing functionally dependent attributes in a relation with their determinant and keeping the determinants themselves in the original relation.

Formal Definition:

• If R is decomposed into two relations {R1, R2}, the join is lossless <==> either of the following holds in the closures of the set of FD’s for R:

• For a decomposition involving more than two relations, the previous test cannot be used.

• Testing for “lossless decomposition”:– Given relation schema R (A1, A2, …, An), a set of functional

dependencies F and decomposition p = {R1, R2,…, Rm}.

• The following algorithm can be used to test wether the decomposition has a lossless join.

Steps:1. Constuct an m by n table S, with a column for each of the n

attributes in R and row for each of the m relations in the decomposition.

2. For each cell S(i,j) of S, if the attribute for the column, Aj is in the relation for the row, Ri, then

set S(i,j) = a(j)else set S(i,j) = b(i,j)

3. Consider each FD, X Y F until no more changes can be made to S.

• Look for rows whose X-column agrees• EQUATE Y-column

4. If after all possible changes have been made to S, a row is made up entirely of symbols a(1), a(2), …, a(n), the join is lossless. If there is no such row, the join is lossy.

Fifth Normal FormDefinition:

A relation is in 5NF if no remaining nonlossprojections are possible, except the trivial one in which the key appears in each projection.

Definition:Decomposition p preserves a set of FD’s, F if the union of all FD’s in Ri implies all the decomposition in F.