Logical Database Design (1 of 3)

Logical Database Design (1 of 3)

John Ortiz

Lecture 6 Logical Database Design (1) 2

Introduction The logical design is a process of refining

DB schema based on functional dependencies.

Why is a logical design needed? Relations translated from ER may have

introduced data redundancy which leads to inconsistencies and update anomalies.

What is involved in logical design? Check for normal forms If necessary, decompose relations


A Database with RedundancyAssume all students of same major are

assigned to the same academic advisor.SID Name Major GPA Advisor Office 2011 John CS 3.4 Smith 3345 1235 Carl CS 3.2 Smith 3345 1003 Ken Math 3.5 Johnson 1120 1034 Bill Math 2.5 Johnson 1120 2005 Mary CS 2.9 Smith 3345 2078 Frank Math 4.0 Johnson 1120 1922 Scott Chem 3.45 Ford 2525

Students

(FD): major Advisor, Office What problems does it have?


Update Anomalies Insertion anomaly: How to add a new

major? Modification anomaly: What would happen

if we change office of Smith in the first tuple?

Deletion anomaly: What would happen if Scott is deleted?

SID Name Major GPA Advisor Office 2011 John CS 3.4 Smith 3345 1235 Carl CS 3.2 Smith 3345 1003 Ken Math 3.5 Johnson 1120 1034 Bill Math 2.5 Johnson 1120 2005 Mary CS 2.9 Smith 3345 2078 Frank Math 4.0 Johnson 1120 1922 Scott Chem 3.45 Ford 2525

Students


A Better Design Decompose Students into two relations

Major Advisor Office CS Smith 3345 Math Johnson 1120 Chem Ford 2525

Major_AdvisorSID Name Major GPA 2011 John CS 3.4 1235 Carl CS 3.2 1003 Ken Math 3.5 1034 Bill Math 2.5 2005 Mary CS 2.9 2078 Frank Math 4.0 1922 Scott Chem 3.45

Students

Decomposition can remove redundancy It may also cause problems if not done

carefully.


Cardinality Ratio The number of relationship instances that

an entity can participate in. There are three:

One-to-one (1:1) One-to-Many (1:M) Many-to-many (M:N)

Note: “1” may mean either zero or one and “many” may mean zero to the DBMS maximum


1:1 Relationships A one-to-one relationship simply means

that for any single instance of one entity, it may participate in zero or one instance in the related entity

E.g. Employee manages Department An employee may or may not be a department

manager, however, each department needs one manager

Employee has partial participation, generally a department would have total participation


1:1 Relationships The foreign key should be placed in the

entity with total participation If both have total participation, the location

of the foreign key is arbitrary Additionally, if both participations are total,

the two entities may be merged Why would you not want to do this?

Access Restrictions Special Attributes Referenced by another entity Performance in Distributed Databases


1:1 Relationships Access Restrictions Special Attributes Referenced by another entity

Salary Med RecsEmp_ID

AIRLINE EMPLOYEES

1

1

Flying HoursEmp_ID1 Pilots only!

Confidential!

Tool Bag ID Color

Emp_ID Job StartDate Address Tool Bag ID

Storage Bin Tool Bag ID

Maint only!


1:M Relationships A one-to-many relationship means that for

any single instance of one entity, it may participate in zero or more instances in the related entity, but the related entity may only participate in zero or one instance.

Subordinate to supervisor. Each subordinate generally will have one supervisor, but each supervisor usually may have zero or more employees.

Not absolute! Sometimes a subordinate may have >1 supervisors – consider this in design!


1:M Relationships The foreign key always goes in the “many”

part of the relationship! This is counter-intuitive, be careful! The FK must always reference a PK in the

related entity or be null. referential integrity

What results from placing the FK in the “one” part of the relationship?

Can part of a PK also be a FK? That coupled with the “existence dependancy”

indicates which type of entity?


1:M Relationship Design Thought Consider the attribute “class” of a

STUDENT entity. Given that all possible values are “Freshman”, “Sophomore”, “Junior”, “Senior”, or “Grad Student.” Is there any reason to place it in its own separate entity?


M:N Relationships A many-to-many relationship means that for

any single instance of one entity, it may participate in zero or more instances in the related entity, and the related entity may participate in zero or more instances as well.

Subordinate to supervisor. Consider allowing each subordinate to have zero or more supervisors.

Now where does the FK go?


M:N Relationships Actually, there are two FKs, one for each

entity. These two keys are placed in their own

entity. Together they make up the PK of that entity. In this case, all FKs must reference a valid

PK in the related entity – why? This entity may be called a link table, join

table, junction table, or simply many-to-many relationship.


M:N Relationships The link table may contain it’s own

attributes. For example, an evaluation by a supervisor. That evaluation may NOT go in the

subordinate entity, because there may be others from different supervisors.

It can not go in the supervisor entity because there may be more than one subordinate.

Only the subordinate/supervisor entities together can uniquely determine that evaluation.

Consider course grade vs. G.P.A.


Functional Dependencies (FDs) Let R(A1, ..., An) be a relation schema. Let

X and Y be two subsets of {A1, ..., An}, i.e., X, Y R. R satisfies a functional dependency, denoted by X Y, if in every legal instance r(R), for any pair of tuples t1 and t2 in r(R), if t1[X] = t2[X], then t1[Y] = t2[Y].

If X Y, we say that “X (functionally) determines Y”.

X may also be called a determinant FDs are about ALL states of a relation.


An Example of FDs Consider a relation Student(SID, Name, Address, Phone,

Major)Some FDs may be: S NM, NA P, … Notations

Use single letter for attributes, i.e., S for SID, N for Name, A for Address, …

Use the relation name for the set of attributes of relation, i.e., Students for {SNAPM}


Properties of FDs X Y says redundant X-values always

cause the redundancy of Y-values. FDs are given by DBAs or DB designers. FDs are enforced/guaranteed by DBMS. Given an instance r of a relation R, we

can only determine that some FD is not satisfied by R, but can not determine if an FD is satisfied by R.


Which FD is satisfied? Which FD does R(A, B, C, D) satisfy, if the

following instance is the only instance of R?

A B, A C, C A, A D, B D, AB D

A B C D A1 B1 C1 D1 A1 B2 C1 D2 A2 B2 C2 D2 A2 B3 C2 D3 A3 B3 C2 D4

R


Closure of FDs Given some FDs, new FDs can often be

inferred. E.g., from S NM and M D, we can infer S D.

Let F be a set of FDs satisfied by R, and X, Y, K be subsets of attributes of R.

F (logically) implies X Y if whenever R satisfies (all FDs in) F, R also satisfies X Y.

The closure of F, denoted by F+, is the set of FDs implied by F, that is,

F+ = {X Y | F implies X Y}.


Candidate Key & Trivial FDs Let F be a set of FDs satisfied by R, and K

R. K is a candidate key (CK) of R if K R is in F+ (i.e., F implies K R); and there is no X K, such that X R is also

in F+ (minimality). (no part of K R ) An FD X Y is trivial if Y X.

Some trivial FDs are: AB B, A , .

Find the candidate keys and trivial FDs in the next example. What is ABC?


F+: An Example *Example: Let F = {AB C, C B} be a set of

FDs satisfied by R(A, B, C).F+ = {A , A A, AB , AB A, AB B,

AB C, AB AB, AB AC, AB BC, AB ABC, AC , AC A, AC B, AC C, AC AB, AC AC, AC BC, AC ABC, ABC , ABC A, ABC B, ABC C, ABC AB, ABC AC, ABC BC, ABC ABC, B , B B, BC , BC B, BC C, BC BC, C , C B, C C, C BC, }


Inference Rules for FDsLet F be a set of FDs satisfied by R, and X, Y,

Z R. Armstrong’s Axioms (1974) for deriving

new FDs(IR1) Reflexivity: If X Y, then X Y is

satisfied by R.(IR2) Augmentation: If X Y is satisfied by

R, then XZ YZ is also satisfied by R. (IR3) Transitivity: If X Y and Y Z are

satisfied by R, then so is X Z.


Compute FD Closure F+ Let F be a set of FDs. To compute F+, start

with FDs in F, repeatedly apply IR1-IR3, until no new FD can be derived.

Theorem: Armstrong's Axioms are sound and complete. Sound: no incorrect FD will be added to

F+. Complete: all FDs in F+ will be

generated.


Additional Axioms Additional rules derivable from

Armstrong's Axioms. (IR4) Decomposition: { X YZ } { X Y, X Z } (IR5) Union: { X Y, X Z } X YZ (IR6) Pseudotransitivity: { X Y, WY Z } WX Z


Closure of Attributes How to determine if F implies X Y? Method 1: Check if X Y is in F+.

Problem: F+ is too expensive to compute!

Method 2: Compute closure of X under F. X+ = { A |= X A F+ }

X+ is the set of attributes that are functionally determined by X under F.

Theorem: X Y F+ if and only if Y X+. Proof: Use IR4 & IR5.

Documents

Logical Database Design (1 of 3)