32
1 Design Process - Where are we? Conceptual Design Conceptual Schema (ER Model) Logical Design Logical Schema (Relational Model) Step 1: ER-to-Relation Mapping Step 2: Normalization: “Improving” the design

1 Design Process - Where are we? Conceptual Design Conceptual Schema (ER Model) Logical Design Logical Schema (Relational Model) Step 1: ER-to-Relational

Embed Size (px)

Citation preview

Page 1: 1 Design Process - Where are we? Conceptual Design Conceptual Schema (ER Model) Logical Design Logical Schema (Relational Model) Step 1: ER-to-Relational

1

Design Process - Where are we?

ConceptualDesign

ConceptualSchema

(ER Model)

LogicalDesign

Logical Schema(Relational Model)

Step 1: ER-to-RelationalMappingStep 2: Normalization:“Improving” the design

Page 2: 1 Design Process - Where are we? Conceptual Design Conceptual Schema (ER Model) Logical Design Logical Schema (Relational Model) Step 1: ER-to-Relational

2

Relations should have semantic unity Information repetition should be avoided

Anomalies: insertion, deletion, modification

Avoid null values as much as possible Difficulties with interpretation

don’t know, don’t care, known but unavailable, does not apply

Specification of joins

Avoid spurious joins

Relational Design Principles

Page 3: 1 Design Process - Where are we? Conceptual Design Conceptual Schema (ER Model) Logical Design Logical Schema (Relational Model) Step 1: ER-to-Relational

3

The TITLE, SALARY attribute values are repeated for each project that the engineer is involved in. Waste of space Complicates updates

Information Repetition

EMPLOYEE

ENO ENAME TITLE SALARY

J. Doe Elect. Eng. 40000M. Smith 34000

M. Smith

Analyst

Analyst 34000A. Lee Mech. Eng. 27000

A. Lee Mech. Eng. 27000J. Miller Programmer 24000

B. Casey Syst. Anal. 34000

L. Chu Elect. Eng. 40000

R. Davis Mech. Eng. 27000

E1E2

E2E3

E3E4

E5

E6

E7

E8 J. Jones Syst. Anal. 34000

24

PJNO RESPDURATION

P1 Manager12P1 Analyst

P2 Analyst6P3 Consultant10

P4 Engineer48P2 Programmer18

P2 Manager24

P4 Manager48

P3 Engineer36

P3 Manager40

This example instance of EMPLOYEE relation violates one of the constraints in our earlier design. Which one?

Page 4: 1 Design Process - Where are we? Conceptual Design Conceptual Schema (ER Model) Logical Design Logical Schema (Relational Model) Step 1: ER-to-Relational

4

If a new employee joins the company, his name, title and salary cannot be inserted into the database unless he is assigned to a project. OR we live with nulls

Insertion Anomaly

ENO

EMPLOYEE

ENAME TITLE SALARY

J. Doe Elect. Eng. 40000M. Smith 34000

M. Smith

Analyst

Analyst 34000A. Lee Mech. Eng. 27000

A. Lee Mech. Eng. 27000J. Miller Programmer 24000

B. Casey Syst. Anal. 34000L. Chu Elect. Eng. 40000

R. Davis Mech. Eng. 27000

E1E2

E2E3

E3E4

E5E6

E7

E8 J. Jones Syst. Anal. 34000

24

PJNO RESPDURATION

P1 Manager12P1 Analyst

P2 Analyst6P3 Consultant10

P4 Engineer48P2 Programmer18

P2 Manager24P4 Manager48

P3 Engineer36

P3 Manager40

Page 5: 1 Design Process - Where are we? Conceptual Design Conceptual Schema (ER Model) Logical Design Logical Schema (Relational Model) Step 1: ER-to-Relational

5

Assume we had the following EMP-PROJ relation instead of the two separate EMPLOYEE and PROJECT relations It is difficult (impossible?) to store information about a new

project until an employee is assigned to it Previous problem still applies

Insertion Anomaly

ENO

EMP-PROJ

ENAME TITLE SALARY

J. Doe Elect. Eng. 40000M. Smith 34000

M. Smith

Analyst

Analyst 34000A. Lee Mech. Eng. 27000

A. Lee Mech. Eng. 27000J. Miller Programmer 24000

B. Casey Syst. Anal. 34000

L. Chu Elect. Eng. 40000

R. Davis Mech. Eng. 27000

E1E2

E2E3

E3E4

E5

E6

E7

E8 J. Jones Syst. Anal. 34000

24

PJNO RESPDURATION

P1 Manager12P1 Analyst

P2 Analyst6

P3 Consultant10

P4 Engineer48P2 Programmer18

P2 Manager24

P4 Manager48

P3 Engineer36

P3 Manager40

PNAME BUDGET

Instrumentation 150000Instrumentation 150000

Database Develop. 135000

Database Develop. 135000

Database Develop. 135000

CAD/CAM 250000

CAD/CAM 250000

CAD/CAM 250000

Maintenance 310000

Maintenance 310000

Page 6: 1 Design Process - Where are we? Conceptual Design Conceptual Schema (ER Model) Logical Design Logical Schema (Relational Model) Step 1: ER-to-Relational

6

If an engineer, who is the only employee on a project, leaves the company, his personal information cannot be deleted. OR the information about that project is lost.

Deletion Anomaly

ENO

EMP-PROJ

ENAME TITLE SALARY

J. Doe Elect. Eng. 40000M. Smith 34000

M. Smith

Analyst

Analyst 34000A. Lee Mech. Eng. 27000

A. Lee Mech. Eng. 27000J. Miller Programmer 24000

B. Casey Syst. Anal. 34000

L. Chu Elect. Eng. 40000

R. Davis Mech. Eng. 27000

E1E2

E2E3

E3E4

E5

E6

E7

E8 J. Jones Syst. Anal. 34000

24

PJNO RESPDURATION

P1 Manager12P1 Analyst

P2 Analyst6

P3 Consultant10

P4 Engineer48P2 Programmer18

P2 Manager24

P4 Manager48

P3 Engineer36

P3 Manager40

PNAME BUDGET

Instrumentation 150000Instrumentation 150000

Database Develop. 135000

Database Develop. 135000

Database Develop. 135000

CAD/CAM 250000

CAD/CAM 250000

CAD/CAM 250000

Maintenance 310000

Maintenance 310000

MGR

E1E1

E5E8

E6E5E5

E8

E8

E6

Page 7: 1 Design Process - Where are we? Conceptual Design Conceptual Schema (ER Model) Logical Design Logical Schema (Relational Model) Step 1: ER-to-Relational

7

If any attribute of project (say BUDGET of P1) is modified, all the tuples for all employees who work on that project need to be modified.

Modification Anomaly

ENO

EMP-PROJ

ENAME TITLE SALARY

J. Doe Elect. Eng. 40000M. Smith 34000

M. Smith

Analyst

Analyst 34000A. Lee Mech. Eng. 27000

A. Lee Mech. Eng. 27000J. Miller Programmer 24000

B. Casey Syst. Anal. 34000

L. Chu Elect. Eng. 40000

R. Davis Mech. Eng. 27000

E1E2

E2E3

E3E4

E5

E6

E7

E8 J. Jones Syst. Anal. 34000

24

PJNO RESPDURATION

P1 Manager12P1 Analyst

P2 Analyst6

P3 Consultant10

P4 Engineer48P2 Programmer18

P2 Manager24

P4 Manager48

P3 Engineer36

P3 Manager40

PNAME BUDGET

Instrumentation 150000Instrumentation 150000

Database Develop. 135000

Database Develop. 135000

Database Develop. 135000

CAD/CAM 250000

CAD/CAM 250000

CAD/CAM 250000

Maintenance 310000

Maintenance 310000

MGR

E1E1

E5E8

E6E5E5

E8

E8

E6

Page 8: 1 Design Process - Where are we? Conceptual Design Conceptual Schema (ER Model) Logical Design Logical Schema (Relational Model) Step 1: ER-to-Relational

8

What to do?

We take each relation individually and “improve” them in terms of the desired characteristics Normalization Normal forms

Page 9: 1 Design Process - Where are we? Conceptual Design Conceptual Schema (ER Model) Logical Design Logical Schema (Relational Model) Step 1: ER-to-Relational

9

Normalization

Normalization is a process of concept separation which applies a top-down methodology for producing a schema by subsequent refinements and decompositions.

Universal relation assumption 1NF to 3NF; 1NF to BCNF Questions:

How do we decompose a schema into a desirable normal form? What criteria the decomposed schemas should follow in order to

preserve the semantics of the original schema? Lossless decomposition: no information loss

Can we guarantee the quality of the decomposition Reconstructability: recover the original relation no spurious joins

What happens to queries? Processing time may increase due to joins Denormalization

Page 10: 1 Design Process - Where are we? Conceptual Design Conceptual Schema (ER Model) Logical Design Logical Schema (Relational Model) Step 1: ER-to-Relational

10

Normal Forms

5NF4NF

BCNF3NF

2NF1NF

All relations

Page 11: 1 Design Process - Where are we? Conceptual Design Conceptual Schema (ER Model) Logical Design Logical Schema (Relational Model) Step 1: ER-to-Relational

11

Normal Forms

Normal Forms can be defined according to keys and dependencies Functional Dependencies

Primary keys (1NF, 2NF, 3NF) All candidate keys ( 2NF, 3NF, BCNF)

Multivalued dependencies (4NF) Join dependencies (5NF) – we won’t talk about these

Page 12: 1 Design Process - Where are we? Conceptual Design Conceptual Schema (ER Model) Logical Design Logical Schema (Relational Model) Step 1: ER-to-Relational

12

Functional dependency (FD) X Y given a value of X, there is one value of Y

Multivalued dependency (MVD) X Y given a value of X, there is a set of values of Y.

Dependencies

Page 13: 1 Design Process - Where are we? Conceptual Design Conceptual Schema (ER Model) Logical Design Logical Schema (Relational Model) Step 1: ER-to-Relational

13

Given relation R defined over U = {A1, A2, ..., An} where X U, Y U. If, for all pairs of tuples t1 and t2 in any legal instance of relation scheme R,

t1[X] = t2[X] fi t1[Y] = t2[Y], then the functional dependency X Y holds in R. Example

In relation PROJ PJNO (PNAME, BUDGET, MGR)

In relation EMPLOYEE (ENO, PJNO) (ENAME, TITLE, SALARY, APT#,

STREET, CITY, DURATION, RESP) ENO (ENAME, TITLE, SALARY,APT#,STREET,CITY) TITLE SAL

Functional Dependency

Page 14: 1 Design Process - Where are we? Conceptual Design Conceptual Schema (ER Model) Logical Design Logical Schema (Relational Model) Step 1: ER-to-Relational

14

Inferencing over FDs

We would like to be able to infer from a given set of FDs.

Example:ENO (ENAME, TITLE, SALARY,APT#,STREET,CITY) fi

ENO ENAME

This requires a set of inference rules Armstrong’s rules (axioms) Additional ones

Page 15: 1 Design Process - Where are we? Conceptual Design Conceptual Schema (ER Model) Logical Design Logical Schema (Relational Model) Step 1: ER-to-Relational

15

Some Basics

Attributes Prime attribute is a member of any key Non-prime attribute is any attribute which is not prime

Full functional dependency A FD XY is a full functional dependency if X is minimal,

i.e., removal of any attribute A from X means the dependency does not hold anymore.

Formally - iff for any A X, (X{A})Y.

Partial functional dependency Formally - iff for any A X, (X{A})Y.

/X→fY

X→pY

Page 16: 1 Design Process - Where are we? Conceptual Design Conceptual Schema (ER Model) Logical Design Logical Schema (Relational Model) Step 1: ER-to-Relational

16

Normal Forms Based on FDs

Second Normal Form (2NF)

Third Normal Form (3NF)

Boyce-Codd Normal Form (BCNF)

First Normal Form (1NF)

1NF eliminates the relations within relations or relations as attributes of tuples.

eliminate the partial functional dependencies of non-prime attributes to key attributes

eliminate the transitive functional dependencies of non-prime attributes to key attributes

eliminate the partial and transitive functional dependencies of prime (key) attributes to key.

Page 17: 1 Design Process - Where are we? Conceptual Design Conceptual Schema (ER Model) Logical Design Logical Schema (Relational Model) Step 1: ER-to-Relational

17

All attribute values are atomic 1NF relation cannot have an attribute value that

is: a set of values (set-value) a tuple of values (nested relation)

This is a standard assumption in relational DBMSs

In object-oriented DBMSs this assumption is relaxed.

First Normal Form

Page 18: 1 Design Process - Where are we? Conceptual Design Conceptual Schema (ER Model) Logical Design Logical Schema (Relational Model) Step 1: ER-to-Relational

18

Two possible definitions: A relation R 2NF iff all non-prime attributes in R are

fully functionally dependent on primary key.

A relation R 2NF iff the attributes are either a candidate key, or

fully dependent on every key.

Partial functional dependencies cause problems.

Second Normal Form

Page 19: 1 Design Process - Where are we? Conceptual Design Conceptual Schema (ER Model) Logical Design Logical Schema (Relational Model) Step 1: ER-to-Relational

19

EMP-PROJ is in 1NF, but not in 2NF due to fd1, fd3 and fd4

Decompose it into EMP (ENO, ENAME, TITLE, SALARY,APT#,STREET,CITY) PROJECT (PJNO,PNAME,BUDGET,MGR) ASSIGN(ENO,PJNO,DURATION,RESP)

2NF – Example

fd1

fd2

EMP-PROJ

ENO ENAME TITLE SALARY PJNO RESPDURATIONPNAME BUDGETAPT# STREET CITY MGR

fd3

fd4

Page 20: 1 Design Process - Where are we? Conceptual Design Conceptual Schema (ER Model) Logical Design Logical Schema (Relational Model) Step 1: ER-to-Relational

20

2NF – Example

fd1

fd2

EMP

ENO ENAME TITLE SALARY APT# STREET CITY

PJNO PNAME BUDGET MGR

PROJECT

ENO DURATION RESPPJNO

ASSIGN

fd3

fd4

Page 21: 1 Design Process - Where are we? Conceptual Design Conceptual Schema (ER Model) Logical Design Logical Schema (Relational Model) Step 1: ER-to-Relational

21

Third Normal Form

Intuitively: A relation R NF iff R NF (i.e., every non-prime attribute is fully

functionally dependent on every key) No non-prime attribute of R is transitively dependent

on the prime key.

The issues is to remove the transitive dependencies

Page 22: 1 Design Process - Where are we? Conceptual Design Conceptual Schema (ER Model) Logical Design Logical Schema (Relational Model) Step 1: ER-to-Relational

22

Third Normal Form

Formally: A relation scheme R defined over U = {A1, A2, …, An} is in 3NF if for all functional dependencies that hold on R of the form XY, where X U and X U, at least one of the following holds: XY is a trivial functional dependency (i.e., Y X or

XY=U) X is a superkey for R Y is contained in a candidate key for R (Y is a prime

attribute) The last condition deals with transitive

dependencies.

Page 23: 1 Design Process - Where are we? Conceptual Design Conceptual Schema (ER Model) Logical Design Logical Schema (Relational Model) Step 1: ER-to-Relational

23

3NF – Example

EMP is not in 3NF because of fd2

TITLE SALARY but TITLE is not a superkey and SALARY is not prime

Problem is that ENO transitively determines SALARY (as well as directly determining it)

Solution:

fd1

fd2

EMP

ENO ENAME TITLE SALARY APT# STREET CITY

EMP

ENO ENAME TITLE APT# STREET CITY TITLE SALARY

PAY

fd1 fd2

Page 24: 1 Design Process - Where are we? Conceptual Design Conceptual Schema (ER Model) Logical Design Logical Schema (Relational Model) Step 1: ER-to-Relational

24

Boyce-Codd Normal Form

You can still have transitive dependencies in 3NF if the dependent attribute(s) are prime.

A 1NF relation R is in BCNF if for every functional dependency XY, X is a superkey.

Properties of BCNF All non-prime attributes are fully dependent on every

key. All prime attributes are fully dependent on the keys that

they do not belong to. No attribute is fully dependent on any set of non-prime

attributes.

Page 25: 1 Design Process - Where are we? Conceptual Design Conceptual Schema (ER Model) Logical Design Logical Schema (Relational Model) Step 1: ER-to-Relational

25

Formally: A relation scheme R defined over U = {A1, A2, …, An} is in BCNF if for all functional dependencies that hold on R of the form XA, where X U and AU, at least one of the following holds : XA is a trivial functional dependency X is a superkey for R

No transitive dependencies.

Boyce-Codd Normal Form

Page 26: 1 Design Process - Where are we? Conceptual Design Conceptual Schema (ER Model) Logical Design Logical Schema (Relational Model) Step 1: ER-to-Relational

26

BCNF – Example

Our relations are all in BCNF Assume, however, that in the PROJECT relation

there was another attribute HEADQTR, and the headquarters of a project was the location of the

manager, and project had to be headquartered at a different city

FDs would be

PJNO PNAME BUDGET MGR HEADQTR

PROJECT

which makes PROJECT in 3NF but not in BCNF

Page 27: 1 Design Process - Where are we? Conceptual Design Conceptual Schema (ER Model) Logical Design Logical Schema (Relational Model) Step 1: ER-to-Relational

27

Relation R defined over U = {A1, A1, ..., An}; XU, Y U, ZU. If for each value of X in R, there is a set of Y values and this set is only dependent on the value of X and independent of the value of Z, then Y has a multivalued dependency (MVD) on X (X Y).

Multivalued Dependency

w1

w2

p1p2p3

s1s2

s3s4

PSW

p4p5

Page 28: 1 Design Process - Where are we? Conceptual Design Conceptual Schema (ER Model) Logical Design Logical Schema (Relational Model) Step 1: ER-to-Relational

28

Relation R defined over U = {A1, A1, ..., An}; XU, Y U. Multivalued dependency

XY

holds on R if in any legal relation r of R, for all pairs of tuples t1 and t2 in r such that t1[X]= t2[X], there exists tuples t3 and t4 in r such that:

1. t1[X]= t2[X]= t3[X]= t4[X]

2. t3[Y]= t1[Y]

3. t3[U–Y]= t2[U–Y]

4. t4[Y]= t2[Y]

5. t4[U–Y]= t1[U–Y]

Multivalued Dependency – Formal

Page 29: 1 Design Process - Where are we? Conceptual Design Conceptual Schema (ER Model) Logical Design Logical Schema (Relational Model) Step 1: ER-to-Relational

29

Assume relationSKILL(ENO, PNO, PLACE)

Each engineer can work on each project; Each engineer can work at any of the branch offices; Each project can be carried out at any of the branch

offices. MVDs

ENO PNO ENO PLACE

MVD Example

Page 30: 1 Design Process - Where are we? Conceptual Design Conceptual Schema (ER Model) Logical Design Logical Schema (Relational Model) Step 1: ER-to-Relational

30

MVD Example

ENO PNO PLACE

SKILL

E1 P1 Toronto

E1 P1 New York

E1 P1 London

E1 P2 Toronto

E1 P2 New York

E1 P2 London

E2 P1 Toronto

E2 P1 New York

E2 P1 London

E2 P2 Toronto

E2 P2 New York

E2 P2 London

Note: Information repetition

Page 31: 1 Design Process - Where are we? Conceptual Design Conceptual Schema (ER Model) Logical Design Logical Schema (Relational Model) Step 1: ER-to-Relational

31

For each MVD of type XY in a relation R, X also functionally determines all the attributes of R.

If a relation is in BCNF and all MVDs are also FDs, the relation is in 4NF.

Formally: A relational scheme R defined over U={A1, A2, ..., An}

is in 4NF with respect to a set D of functional and multivalued dependencies if for all multivalued dependencies in D+ of the form X Y, where X U and YU , at least one of the following hold : X Y is a trivial multivalued dependency X is a superkey for scheme R

Fourth Normal Form (4NF)

Page 32: 1 Design Process - Where are we? Conceptual Design Conceptual Schema (ER Model) Logical Design Logical Schema (Relational Model) Step 1: ER-to-Relational

32

A relational scheme R defined over U = {A1, A2, ..., An} is in 4NF with respect to a set D of functional and multivalued dependencies if for all multivalued dependencies in D+ of the form X Y, where XU and YU, at least one of the following hold :X Y is a trivial multivalued dependency

X Y is a trivial MVD if Y X or X Y = R

X is a superkey for scheme R

4NF – Formal