41
1 Database Design Theory Which tables to have in a database Normalization

1 Database Design Theory Which tables to have in a database Normalization

  • View
    234

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 1 Database Design Theory Which tables to have in a database Normalization

1

Database Design Theory

Which tables to have in a database

Normalization

Page 2: 1 Database Design Theory Which tables to have in a database Normalization

2

Database Design Theory

Given some body of data to be represented in a database, as modelled in an E-R diagram, what is the most suitable logical structure for that data?

How do we decide on the appropriate tables and the attributes of the tables?

Page 3: 1 Database Design Theory Which tables to have in a database Normalization

3

Requirements

Accommodate data integrity. General integrity constraints

E.g., referential integrity Domain specific integrity constraints

E.g., no user can borrow more than 4 books.

Robust in the sense that the design should be application independent.

We try to achieve this through the elimination of redundancy.

Page 4: 1 Database Design Theory Which tables to have in a database Normalization

4

The Danger of Redundancy

Consider the example For students, we want to know student ID, name

and address. For courses, we need to know course ID, title and

lecturer. For employees, we need to know the employee ID,

name and department. For each department, we need to know the

department ID, the name and the location. For each enrollment, we need to know the grade.

Page 5: 1 Database Design Theory Which tables to have in a database Normalization

5

The Danger of Redundancy Continued One solution store everything in one big table

Appl(sid,name,addr,

cid,title,

eid, ename,

deptID, dname, loc,

grade) Clearly, this leads to redundancy.

For example, we need to store the student’s address for every course they have been registered for.

Page 6: 1 Database Design Theory Which tables to have in a database Normalization

6

The Danger of Redundancy. Conclusion If everything in one table, then

Greater space requirements Insertion anomalies

Cannot store information on student who has not passed a course yet.

Deletion anomalies We may want to delete a course but some student may be

registered only for that course. Update anomalies

If a student changes their address, many tuples need to be updated.

Danger of inconsistency in database.

Page 7: 1 Database Design Theory Which tables to have in a database Normalization

7

Good Database Design

The basic idea: A “good” database is one in which each table

consists of a primary key and a set of mutually independent attributes.

Strategy for achieving a good database design: Identify undesirable dependencies in a table and

decompose by projection.

Page 8: 1 Database Design Theory Which tables to have in a database Normalization

8

Functional Dependencies (FD’s)

Attribute (set) Y is functionally dependent on attribute (set) X if, whenever two tuples have the same value for X, they also have the same value for Y.

Notation: X Y X is called the determinant. An FD A B is non-trivial if and only if B A and

BA.

Page 9: 1 Database Design Theory Which tables to have in a database Normalization

9

Functional Dependencies in Our Example.

If everything is in one table, then these FD’s exist: sid name, addr cid title eid ename deptid dname, loc sid, cid grade

Note we also have sid, cid, eid, deptid

All other attributes

Page 10: 1 Database Design Theory Which tables to have in a database Normalization

10

Keys Again

A set of attributes X in a relation R is a superkey if every attribute in R is functionally dependent on X.

A candidate key is a minimal superkey. Alternate keys are candidate keys that have not

been selected as primary keys. A prime attribute is a member of a candidate key.

Page 11: 1 Database Design Theory Which tables to have in a database Normalization

11

Armstrong’s Axioms

Let X,Y and Z be sets of attributes of a relation R Reflexivity: (X Y) (X Y)

Augmentation

(X Y) (XZ YZ)

Transitivity

((X Y) & (Y Z)) (X Z)

Axioms are sound and complete Can derive all FDs that follow from a given set of

FDs. Derive only true FDs

Page 12: 1 Database Design Theory Which tables to have in a database Normalization

12

Some Consequences of Armstrong’s Axioms The following are implied by Armstrong’s axioms:

Decomposition

(X YZ) (X Y) Union

((X Y)&(X Z)) (X YZ) Pseudo transitivity

((X Y)&(WY Z)) (WX Z)

Page 13: 1 Database Design Theory Which tables to have in a database Normalization

13

Closure of a Set of Functional Dependencies If F is a set of functional dependencies, the

closure of F, F+, is the set of all functional dependencies logically implied by those in F.

Useful since it allows us to determine candidate keys (there must be functional dependency to all other attributes), but very expensive to compute.

Page 14: 1 Database Design Theory Which tables to have in a database Normalization

14

Closure Under a Set of Functional Dependencies Since F+ is too expensive to compute, we use

closure of X under a set of functional dependencies, X+.

(X Y) in F if and only if

Y X+. Since X+ is relatively easy to compute, we can

now verify whether X is a superkey.

Page 15: 1 Database Design Theory Which tables to have in a database Normalization

15

Computing X+.

To compute X+ under a set of FDs F:INPUT: X, F

OUTPUT: X+

S := X

WHILE

there is a (ZY) in F with Z S and Y S DO S := SY

ENDWHILE

X+ = S

Page 16: 1 Database Design Theory Which tables to have in a database Normalization

16

Decomposition

Recall that having identified undesirable FDs, we now need to decompose.

Decomposition: Let U be a relation scheme. A set of {R1,..,Rn} of

relation schemes is a decomposition of U if

R1 … Rn = U Every attribute of U occurs in at least one Ri.

Page 17: 1 Database Design Theory Which tables to have in a database Normalization

17

Desirable Properties of Decomposition

Decompositions should be Lossless Dependency preserving No redundancy Minimal number of tables

Sometimes, not all properties can be achieved simultaneously.

Page 18: 1 Database Design Theory Which tables to have in a database Normalization

18

Lossless Decomposition

Let {R1,..,Rn} a decomposition of U u relation instance over U Pi = Ri(u) for i from 1 to n

Then {R1,..,Rn} is a lossless decomposition if

u = P1 … P n

In other words, the original relation can be reconstructed.

Page 19: 1 Database Design Theory Which tables to have in a database Normalization

19

Dependency Preserving Decompositions

In decomposing a table, ensure that any FDs are easily enforceable.

Example: Relation U(A,B,C) FDs: A B, A C, B C If we decompose U into R(A,B) and S(B,C), then A

B, B C can be easily enforced when changing R or S.

Because of transitivity, A C is automatically enforced.

Page 20: 1 Database Design Theory Which tables to have in a database Normalization

20

Non-Dependency Preserving Decomposition

If we decompose U into R’(A,B) and S’(A,C), then enforcing A B and A C is easy.

However, B C becomes an interrelational constraint and can only be enforced through a join.

This decomposition is not dependency preserving.

Page 21: 1 Database Design Theory Which tables to have in a database Normalization

21

Normalization

Normal forms, as defined in relational database theory, are guidelines for the design of the tables in the database.

Normalization reduces redundancy. Important to remember why we want to avoid

redundancy Space requirements Insertion, deletion and update anomalies.

Page 22: 1 Database Design Theory Which tables to have in a database Normalization

22

The Normal Forms

First normal form Second normal form Third normal form Boyce-Codd normal form Fourth normal form Fifth normal form The normal forms are ordered in that everything in

2NF is also in 1NF. We ignore 5NF, as violations hardly occur in practice.

Page 23: 1 Database Design Theory Which tables to have in a database Normalization

23

First Normal Form

A relation is in 1NF iff the value of each attribute in a tuple is atomic.

A relation which is not in 1NF

sid cid grade

123 CS51S A

CS51T B

234 CS51S C

CS52S B

CS52T B

Page 24: 1 Database Design Theory Which tables to have in a database Normalization

24

Getting Tables into 1NF

Normalizing a table which is not in 1NF is easy: Simply repeat the other fields.

Thussid cid grade

123 CS51S A

123 CS51T B

234 CS51S C

234 CS52S B

234 CS52T B

Page 25: 1 Database Design Theory Which tables to have in a database Normalization

25

Second and Third Normal Form

Second and third normal form concern relationship between non-key and prime attributes.

Recall that a prime attribute is a member of a candidate key.

Under 2NF and 3NF, a non-key attribute value must provide a fact about the key, the whole key and nothing but the key.

Every non-prime attribute must be fully functionally dependent on a candidate key.

Page 26: 1 Database Design Theory Which tables to have in a database Normalization

26

Second Normal Form

2NF is violated when a non-key attribute depends on a proper subset of a candidate key.

The following violates 2NF

Result(cid, sid, name, grade)

as name is functionally dependent on sid alone.

Page 27: 1 Database Design Theory Which tables to have in a database Normalization

27

Dangers of Violating 2NF

Note that name is repeated for every course that a student has a grade for.

Problems: Danger of inconsistency if a student changes their

name, e.g., by getting married. If a student has not passed any courses yet, then

the student’s name cannot be stored.

Page 28: 1 Database Design Theory Which tables to have in a database Normalization

28

Getting Tables into 2NF

Decompose the table intoResult(cid, sid, grade)

Student(sid, name)

This decomposition leads to longer retrieval times for queries which involve joins.

Normalization is necessary to avoid anomalies which arise because of changes to attributes.

If little chance of changes, then sometimes do not normalize.

Page 29: 1 Database Design Theory Which tables to have in a database Normalization

29

Third Normal Form

Third normal form is violated when a non-prime attribute depends on another non-prime attribute.

The following violates 3NF

Empl(eid, dept, loc)

loc is a fact about dept. Danger same as violation of 2NF.

Page 30: 1 Database Design Theory Which tables to have in a database Normalization

30

Getting Tables into 3NF

Again, decomposeEmpl(eid, dept)

Department(dept, loc)

We can always restore 3NF through a lossless and dependency preserving decomposition.

Page 31: 1 Database Design Theory Which tables to have in a database Normalization

31

Boyce-Codd Normal Form (BCNF)

A relation scheme R is in BCNF if every determinant of a FD over R is a candidate key.

In other words, the determinant of every FD is a superkey.

Violation of BCNF R(A,B,C,D,E,F) { A BC, D AEF } D+ = ABCDEF

D is a good primary key. A+ = ABC

Page 32: 1 Database Design Theory Which tables to have in a database Normalization

32

Another Violation of BCNF

Assume that we give each registration for a course a unique registration number

Reg(rid, sid, cid, sname, grade)

FDs rid sid, cid sid, cid rid, grade sid sname rid+ = All attributes

Page 33: 1 Database Design Theory Which tables to have in a database Normalization

33

Getting Tables into BCNF

Decompose according to the FD whose determinant is not a superkey.

In our example, sid sname This gives

Reg(rid, sid, cid, grade) Stud(sid, sname)

Not always possible to get tables into BCNF while preserving all functional dependencies.

Page 34: 1 Database Design Theory Which tables to have in a database Normalization

34

Example where BCNF is not possible

Consider R(A,B,C) { AB C, C B}

Not in BCNF because C is not a superkey. However, every decomposition of R fails to be

dependency preserving as we have to split up the attributes in AB C

Have to settle for 3NF.

Page 35: 1 Database Design Theory Which tables to have in a database Normalization

35

Multivalued Dependencies (MVDs)

In an FD, X Y, knowing the value of X means that you know the unique value for Y.

In an MVD, X Y, knowing the value of X means that you know the set of values from which Y can come.

Page 36: 1 Database Design Theory Which tables to have in a database Normalization

36

Example of MVD

Assume we have two streams for some course, taught by different instructors, and that for each course, we use two textbooks.

Example:course instructor text

CS51T Rao Date

Mansingh Korth

CS52S Rao Jackson

Stitt Rich

Page 37: 1 Database Design Theory Which tables to have in a database Normalization

37

Example of MVD Continued

Putting table in 1NF givescourse instructor text

CS51T Rao Date

CS51T Rao Korth

CS51T Mansingh Date

CS51T Mansingh Korth

CS52S Rao Jackson

CS52S Rao Rich

CS52S Stitt Jackson

CS52S Stitt Rich

With primary key course, instructor, text

Since no FD, in BCNF.

Page 38: 1 Database Design Theory Which tables to have in a database Normalization

38

Redundancy because of MVDs

However, still redundancy in the table because if <c,p,x> and <c,p’,x’> in table <c,p’,x> and

<c,p,x’> in table too. The table contains two multivalued dependencies:

course instructor course text

Danger of insertion and update anomalies

Page 39: 1 Database Design Theory Which tables to have in a database Normalization

39

Fourth Normal Form

Under 4NF, a relation should not contain two or more independent MVDs.

In other words, if there is a MVD, X Y, then X should be a superkey.

Page 40: 1 Database Design Theory Which tables to have in a database Normalization

40

Getting Tables into 4NF

Again, get a table into 4NF through decomposition so that each MVD is captured in a separate table.

Example: CP(course, instructor) CT(course, text)

Page 41: 1 Database Design Theory Which tables to have in a database Normalization

41

Normalization Reconsidered

Normalization helps avoid: Insertion anomalies Update anomalies Deletion anomalies

Normalization increases retrieval time for some queries.