View
224
Download
0
Category
Preview:
Citation preview
1
Design Process - Where are we?
ConceptualDesign
ConceptualSchema
(ER Model)
LogicalDesign
Logical Schema(Relational Model)
Step 1: ER-to-RelationalMappingStep 2: Normalization:“Improving” the design
2
Relations should have semantic unity Information repetition should be avoided
Anomalies: insertion, deletion, modification
Avoid null values as much as possible Difficulties with interpretation
don’t know, don’t care, known but unavailable, does not apply
Specification of joins
Avoid spurious joins
Relational Design Principles
3
The TITLE, SALARY attribute values are repeated for each project that the engineer is involved in. Waste of space Complicates updates
Information Repetition
EMPLOYEE
ENO ENAME TITLE SALARY
J. Doe Elect. Eng. 40000M. Smith 34000
M. Smith
Analyst
Analyst 34000A. Lee Mech. Eng. 27000
A. Lee Mech. Eng. 27000J. Miller Programmer 24000
B. Casey Syst. Anal. 34000
L. Chu Elect. Eng. 40000
R. Davis Mech. Eng. 27000
E1E2
E2E3
E3E4
E5
E6
E7
E8 J. Jones Syst. Anal. 34000
24
PJNO RESPDURATION
P1 Manager12P1 Analyst
P2 Analyst6P3 Consultant10
P4 Engineer48P2 Programmer18
P2 Manager24
P4 Manager48
P3 Engineer36
P3 Manager40
This example instance of EMPLOYEE relation violates one of the constraints in our earlier design. Which one?
4
If a new employee joins the company, his name, title and salary cannot be inserted into the database unless he is assigned to a project. OR we live with nulls
Insertion Anomaly
ENO
EMPLOYEE
ENAME TITLE SALARY
J. Doe Elect. Eng. 40000M. Smith 34000
M. Smith
Analyst
Analyst 34000A. Lee Mech. Eng. 27000
A. Lee Mech. Eng. 27000J. Miller Programmer 24000
B. Casey Syst. Anal. 34000L. Chu Elect. Eng. 40000
R. Davis Mech. Eng. 27000
E1E2
E2E3
E3E4
E5E6
E7
E8 J. Jones Syst. Anal. 34000
24
PJNO RESPDURATION
P1 Manager12P1 Analyst
P2 Analyst6P3 Consultant10
P4 Engineer48P2 Programmer18
P2 Manager24P4 Manager48
P3 Engineer36
P3 Manager40
5
Assume we had the following EMP-PROJ relation instead of the two separate EMPLOYEE and PROJECT relations It is difficult (impossible?) to store information about a new
project until an employee is assigned to it Previous problem still applies
Insertion Anomaly
ENO
EMP-PROJ
ENAME TITLE SALARY
J. Doe Elect. Eng. 40000M. Smith 34000
M. Smith
Analyst
Analyst 34000A. Lee Mech. Eng. 27000
A. Lee Mech. Eng. 27000J. Miller Programmer 24000
B. Casey Syst. Anal. 34000
L. Chu Elect. Eng. 40000
R. Davis Mech. Eng. 27000
E1E2
E2E3
E3E4
E5
E6
E7
E8 J. Jones Syst. Anal. 34000
24
PJNO RESPDURATION
P1 Manager12P1 Analyst
P2 Analyst6
P3 Consultant10
P4 Engineer48P2 Programmer18
P2 Manager24
P4 Manager48
P3 Engineer36
P3 Manager40
PNAME BUDGET
Instrumentation 150000Instrumentation 150000
Database Develop. 135000
Database Develop. 135000
Database Develop. 135000
CAD/CAM 250000
CAD/CAM 250000
CAD/CAM 250000
Maintenance 310000
Maintenance 310000
6
If an engineer, who is the only employee on a project, leaves the company, his personal information cannot be deleted. OR the information about that project is lost.
Deletion Anomaly
ENO
EMP-PROJ
ENAME TITLE SALARY
J. Doe Elect. Eng. 40000M. Smith 34000
M. Smith
Analyst
Analyst 34000A. Lee Mech. Eng. 27000
A. Lee Mech. Eng. 27000J. Miller Programmer 24000
B. Casey Syst. Anal. 34000
L. Chu Elect. Eng. 40000
R. Davis Mech. Eng. 27000
E1E2
E2E3
E3E4
E5
E6
E7
E8 J. Jones Syst. Anal. 34000
24
PJNO RESPDURATION
P1 Manager12P1 Analyst
P2 Analyst6
P3 Consultant10
P4 Engineer48P2 Programmer18
P2 Manager24
P4 Manager48
P3 Engineer36
P3 Manager40
PNAME BUDGET
Instrumentation 150000Instrumentation 150000
Database Develop. 135000
Database Develop. 135000
Database Develop. 135000
CAD/CAM 250000
CAD/CAM 250000
CAD/CAM 250000
Maintenance 310000
Maintenance 310000
MGR
E1E1
E5E8
E6E5E5
E8
E8
E6
7
If any attribute of project (say BUDGET of P1) is modified, all the tuples for all employees who work on that project need to be modified.
Modification Anomaly
ENO
EMP-PROJ
ENAME TITLE SALARY
J. Doe Elect. Eng. 40000M. Smith 34000
M. Smith
Analyst
Analyst 34000A. Lee Mech. Eng. 27000
A. Lee Mech. Eng. 27000J. Miller Programmer 24000
B. Casey Syst. Anal. 34000
L. Chu Elect. Eng. 40000
R. Davis Mech. Eng. 27000
E1E2
E2E3
E3E4
E5
E6
E7
E8 J. Jones Syst. Anal. 34000
24
PJNO RESPDURATION
P1 Manager12P1 Analyst
P2 Analyst6
P3 Consultant10
P4 Engineer48P2 Programmer18
P2 Manager24
P4 Manager48
P3 Engineer36
P3 Manager40
PNAME BUDGET
Instrumentation 150000Instrumentation 150000
Database Develop. 135000
Database Develop. 135000
Database Develop. 135000
CAD/CAM 250000
CAD/CAM 250000
CAD/CAM 250000
Maintenance 310000
Maintenance 310000
MGR
E1E1
E5E8
E6E5E5
E8
E8
E6
8
What to do?
We take each relation individually and “improve” them in terms of the desired characteristics Normalization Normal forms
9
Normalization
Normalization is a process of concept separation which applies a top-down methodology for producing a schema by subsequent refinements and decompositions.
Universal relation assumption 1NF to 3NF; 1NF to BCNF Questions:
How do we decompose a schema into a desirable normal form? What criteria the decomposed schemas should follow in order to
preserve the semantics of the original schema? Lossless decomposition: no information loss
Can we guarantee the quality of the decomposition Reconstructability: recover the original relation no spurious joins
What happens to queries? Processing time may increase due to joins Denormalization
10
Normal Forms
5NF4NF
BCNF3NF
2NF1NF
All relations
11
Normal Forms
Normal Forms can be defined according to keys and dependencies Functional Dependencies
Primary keys (1NF, 2NF, 3NF) All candidate keys ( 2NF, 3NF, BCNF)
Multivalued dependencies (4NF) Join dependencies (5NF) – we won’t talk about these
12
Functional dependency (FD) X Y given a value of X, there is one value of Y
Multivalued dependency (MVD) X Y given a value of X, there is a set of values of Y.
Dependencies
13
Given relation R defined over U = {A1, A2, ..., An} where X U, Y U. If, for all pairs of tuples t1 and t2 in any legal instance of relation scheme R,
t1[X] = t2[X] fi t1[Y] = t2[Y], then the functional dependency X Y holds in R. Example
In relation PROJ PJNO (PNAME, BUDGET, MGR)
In relation EMPLOYEE (ENO, PJNO) (ENAME, TITLE, SALARY, APT#,
STREET, CITY, DURATION, RESP) ENO (ENAME, TITLE, SALARY,APT#,STREET,CITY) TITLE SAL
Functional Dependency
14
Inferencing over FDs
We would like to be able to infer from a given set of FDs.
Example:ENO (ENAME, TITLE, SALARY,APT#,STREET,CITY) fi
ENO ENAME
This requires a set of inference rules Armstrong’s rules (axioms) Additional ones
15
Some Basics
Attributes Prime attribute is a member of any key Non-prime attribute is any attribute which is not prime
Full functional dependency A FD XY is a full functional dependency if X is minimal,
i.e., removal of any attribute A from X means the dependency does not hold anymore.
Formally - iff for any A X, (X{A})Y.
Partial functional dependency Formally - iff for any A X, (X{A})Y.
/X→fY
X→pY
16
Normal Forms Based on FDs
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
First Normal Form (1NF)
1NF eliminates the relations within relations or relations as attributes of tuples.
eliminate the partial functional dependencies of non-prime attributes to key attributes
eliminate the transitive functional dependencies of non-prime attributes to key attributes
eliminate the partial and transitive functional dependencies of prime (key) attributes to key.
17
All attribute values are atomic 1NF relation cannot have an attribute value that
is: a set of values (set-value) a tuple of values (nested relation)
This is a standard assumption in relational DBMSs
In object-oriented DBMSs this assumption is relaxed.
First Normal Form
18
Two possible definitions: A relation R 2NF iff all non-prime attributes in R are
fully functionally dependent on primary key.
A relation R 2NF iff the attributes are either a candidate key, or
fully dependent on every key.
Partial functional dependencies cause problems.
Second Normal Form
19
EMP-PROJ is in 1NF, but not in 2NF due to fd1, fd3 and fd4
Decompose it into EMP (ENO, ENAME, TITLE, SALARY,APT#,STREET,CITY) PROJECT (PJNO,PNAME,BUDGET,MGR) ASSIGN(ENO,PJNO,DURATION,RESP)
2NF – Example
fd1
fd2
EMP-PROJ
ENO ENAME TITLE SALARY PJNO RESPDURATIONPNAME BUDGETAPT# STREET CITY MGR
fd3
fd4
20
2NF – Example
fd1
fd2
EMP
ENO ENAME TITLE SALARY APT# STREET CITY
PJNO PNAME BUDGET MGR
PROJECT
ENO DURATION RESPPJNO
ASSIGN
fd3
fd4
21
Third Normal Form
Intuitively: A relation R NF iff R NF (i.e., every non-prime attribute is fully
functionally dependent on every key) No non-prime attribute of R is transitively dependent
on the prime key.
The issues is to remove the transitive dependencies
22
Third Normal Form
Formally: A relation scheme R defined over U = {A1, A2, …, An} is in 3NF if for all functional dependencies that hold on R of the form XY, where X U and X U, at least one of the following holds: XY is a trivial functional dependency (i.e., Y X or
XY=U) X is a superkey for R Y is contained in a candidate key for R (Y is a prime
attribute) The last condition deals with transitive
dependencies.
23
3NF – Example
EMP is not in 3NF because of fd2
TITLE SALARY but TITLE is not a superkey and SALARY is not prime
Problem is that ENO transitively determines SALARY (as well as directly determining it)
Solution:
fd1
fd2
EMP
ENO ENAME TITLE SALARY APT# STREET CITY
EMP
ENO ENAME TITLE APT# STREET CITY TITLE SALARY
PAY
fd1 fd2
24
Boyce-Codd Normal Form
You can still have transitive dependencies in 3NF if the dependent attribute(s) are prime.
A 1NF relation R is in BCNF if for every functional dependency XY, X is a superkey.
Properties of BCNF All non-prime attributes are fully dependent on every
key. All prime attributes are fully dependent on the keys that
they do not belong to. No attribute is fully dependent on any set of non-prime
attributes.
25
Formally: A relation scheme R defined over U = {A1, A2, …, An} is in BCNF if for all functional dependencies that hold on R of the form XA, where X U and AU, at least one of the following holds : XA is a trivial functional dependency X is a superkey for R
No transitive dependencies.
Boyce-Codd Normal Form
26
BCNF – Example
Our relations are all in BCNF Assume, however, that in the PROJECT relation
there was another attribute HEADQTR, and the headquarters of a project was the location of the
manager, and project had to be headquartered at a different city
FDs would be
PJNO PNAME BUDGET MGR HEADQTR
PROJECT
which makes PROJECT in 3NF but not in BCNF
27
Relation R defined over U = {A1, A1, ..., An}; XU, Y U, ZU. If for each value of X in R, there is a set of Y values and this set is only dependent on the value of X and independent of the value of Z, then Y has a multivalued dependency (MVD) on X (X Y).
Multivalued Dependency
w1
w2
p1p2p3
s1s2
s3s4
PSW
p4p5
28
Relation R defined over U = {A1, A1, ..., An}; XU, Y U. Multivalued dependency
XY
holds on R if in any legal relation r of R, for all pairs of tuples t1 and t2 in r such that t1[X]= t2[X], there exists tuples t3 and t4 in r such that:
1. t1[X]= t2[X]= t3[X]= t4[X]
2. t3[Y]= t1[Y]
3. t3[U–Y]= t2[U–Y]
4. t4[Y]= t2[Y]
5. t4[U–Y]= t1[U–Y]
Multivalued Dependency – Formal
29
Assume relationSKILL(ENO, PNO, PLACE)
Each engineer can work on each project; Each engineer can work at any of the branch offices; Each project can be carried out at any of the branch
offices. MVDs
ENO PNO ENO PLACE
MVD Example
30
MVD Example
ENO PNO PLACE
SKILL
E1 P1 Toronto
E1 P1 New York
E1 P1 London
E1 P2 Toronto
E1 P2 New York
E1 P2 London
E2 P1 Toronto
E2 P1 New York
E2 P1 London
E2 P2 Toronto
E2 P2 New York
E2 P2 London
Note: Information repetition
31
For each MVD of type XY in a relation R, X also functionally determines all the attributes of R.
If a relation is in BCNF and all MVDs are also FDs, the relation is in 4NF.
Formally: A relational scheme R defined over U={A1, A2, ..., An}
is in 4NF with respect to a set D of functional and multivalued dependencies if for all multivalued dependencies in D+ of the form X Y, where X U and YU , at least one of the following hold : X Y is a trivial multivalued dependency X is a superkey for scheme R
Fourth Normal Form (4NF)
32
A relational scheme R defined over U = {A1, A2, ..., An} is in 4NF with respect to a set D of functional and multivalued dependencies if for all multivalued dependencies in D+ of the form X Y, where XU and YU, at least one of the following hold :X Y is a trivial multivalued dependency
X Y is a trivial MVD if Y X or X Y = R
X is a superkey for scheme R
4NF – Formal
Recommended