Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation...

Preview:

Citation preview

Lecture 5

CS4411: Databases II

• Execrcises from Relational Algebra • Functional Dependencies

Agenda

Relational Algebra: Exercises

§ Select 𝜎<𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑠𝑠𝑠𝑐𝑠𝑠𝑠𝑠𝑠> 𝑅 • Project 𝜋<𝑎𝑠𝑠𝑎𝑠𝑎𝑎𝑠𝑠 𝑠𝑠𝑠𝑠> 𝑅 • Rename 𝜌<𝑠𝑠𝑛 𝑠𝑠𝑠𝑠𝑠𝑎> 𝑅 § Union 𝑅 ∪ 𝑆 § Intersection 𝑅 ∩ 𝑆 • Difference 𝑅 − 𝑆 • Cross product 𝑅 × 𝑆 • Join R ⋈<𝑗𝑠𝑠𝑠 𝑠𝑠𝑠𝑐𝑠𝑠𝑠𝑠𝑠> 𝑆 • Natural join 𝑅 ⋈ 𝑆 • Division 𝑅 ÷ 𝑆

Review of Operators

Exercises

Suppose relations R and S have n tuples and m tuples respectively. Give the minimum and maximum number of tuples that the results of the following expressions can have:

• Union: 𝑅 ∪ 𝑆

If all the tuples of R and S are different, then the union has maximum:

n + m tuples.

The minimum number of tuples that can appear in the result occurs if every tuple of one relation also appears in the other. Then the union has:

max(n, m) tuples.

Exercises

Suppose relations R and S have n tuples and m tuples respectively. Give the minimum and maximum number of tuples that the results of the following expressions can have:

• Natural join: 𝑅 ⋈ 𝑆

If all the tuples in one relation can pair successfully with all the tuples in the other relation, then the natural join has:

n * m tuples.

The minimum number of tuples that can appear in the result occurs if none of the tuples of one relation can pair successfully with all the tuples in the other relation. Then the natural join has:

zero tuples.

Exercises

Suppose relations R and S have n tuples and m tuples respectively. Give the minimum and maximum number of tuples that the results of the following expressions can have:

• 𝜎𝐶 𝑅 x S

If the condition C brings back all the tuples of R, then the cross product will contain: n * m tuples. If the condition C brings back none of the tuples of R, then the cross product will contain: zero tuples.

Exercises

We define r as the schema of R and s as the schema of S:

(𝑅 ⋈ 𝑆) πr

δ(πr∩s(S)) where δ is the duplicate-elimination operator 𝑅 ⋈

R –(R – πr(𝑅 ⋈ 𝑆))

Exercises

πr(𝑅 ⋈ 𝑆) R –

We define r as the schema of R

Exercises

• No manufacturer of PC's may also make laptops

πmaker(σtype = laptop(Product)) ∩

πmaker(σtype = pc(Product)) = ø

Functional Dependencies

• Set of relation schemas obtained by translating from an E/R diagram might need improvement – modeling with E/R diagrams is an art, not a science; relies

on experience and intuition – multiple alternative designs are possible, how to choose?

• Problems caused by redundant storage of info – wasted space – anomalies when updating, inserting or deleting tuples

Basic idea: replace a relation schema with a collection of "smaller" schemas. This is called decomposition.

Improving Relation Schemas

• There is a theory for systematically guiding the improvement of relational designs, called normalization

• Normalization uses the notion of integrity constraints (ICs) on the information – functional dependencies – multi-valued dependencies – referential integrity constraints

Improving Relation Schemas

A database schema is in First Normal Form if all tables are flat

Name GPA Courses

Alice 3.8

Bob 3.7

Carol 3.9

Math

DB

OS

DB

OS

Math

OS

Student Name GPA

Alice 3.8

Bob 3.7

Carol 3.9

Student

Course

Math

DB

OS Student Course

Alice Math

Carol Math

Alice DB

Bob DB

Alice OS

Carol OS

Takes

Course

May need to add keys

First Normal Form (1NF)

• Based on Functional Dependencies – 2nd Normal Form (obsolete) – 3rd Normal Form – Boyce Codd Normal Form (BCNF)

• Based on Multivalued Dependencies – 4th Normal Form

• Based on Join Dependencies – 5th Normal Form

Discuss next

Normal Forms

• A form of constraint (hence, part of the schema) • Finding them is part of the database design

Functional Dependencies

Functional Dependency:

A1, A2, …, An à B1, B2, …, Bm

Meaning: If two tuples agree on the attributes

then they must also agree on the attributes A1, A2, …, An

B1, B2, …, Bm

We say "functionally determine"

Definition: A1, ..., An à B1, ..., Bm holds in R if: "t, t’ Î R, (t.A1=t’.A1 Ù ... Ù t.An=t’.An Þ t.B1=t’.B1 Ù ... Ù t.Bm=t’.Bm )

A1 ... An B1 ... Bm

if t, t’ agree here

then t, t’ agree here

t

t’

R

Functional Dependencies

• This relation satisfies C→T but does not satisfy S→T

• In general, a schema in which for a given relation R there are two properties A,B such that A→B but A is not a key is problematic!

S C T

Bart Math Mrs. Krabappel

Lisa Math Mrs. Krabappel

Lisa Logic Ms. Hoover

Example

• EmpID à Name, Phone, Position • Position à Phone • but Phone à Position

EmplID Name Phone Position

E0045 Smith 1234 Clerk

E1881 John 9876 SalesRep

E1111 Smith 9876 SalesRep

E9999 Mary 1234 Lawyer

Example

FD’s are constraints: • On some instances they hold • On others they don’t

name category color department price

Gizmo Gadget Green Toys 49

Tweaker Gadget Green Toys 99

Does this instance satisfy all the FDs ?

name à color category à department color, category à price

Example

Consider these FDs:

Product(name, category, color, department, price)

No: color, category à price doesn’t hold

FD’s are constraints: • On some instances they hold • On others they don’t

Does this instance satisfy all the FDs ?

name à color category à department color, category à price

Example

Consider these FDs:

Product(name, category, color, department, price)

name category color department price

Gizmo Gadget Green Toys 49

Tweaker Gadget Black Toys 99

Gizmo Stationary Green Office-supp. 59

Yes!

If some FDs are satisfied, then others are satisfied too

If all these FDs are true: name à color category à department color, category à price

Then this FD also holds: name, category à price

Why ?? We say that the new FD is implied

Inference

Is equivalent to

Splitting rule and Combing rule

A1 ... An B1 ... Bm

A1, A2, …, An à B1, B2, …, Bm

A1, A2, …, An à B1 A1, A2, …, An à B2 . . . . . A1, A2, …, An à Bm

Inference

Trivial Rule

A1 … Am where i = 1, 2, ..., n

A1, A2, …, An à Ai

Inference

Ai à Ai

Transitive Closure Rule

If A1, A2, …, An à B1, B2, …, Bm

and B1, B2, …, Bm à C1, C2, …, Cp

then A1, A2, …, An à C1, C2, …, Cp

A1 … An B1 … Bm C1 ... Cp

Inference

From: 1. name à color 2. category à department 3. color, category à price

name, category à price To:

Example

Inferred FD Which Rule did we apply ?

4. name, category à name Trivial rule 5. name, category à color Transitivity on 4, 1 6. name, category à category Trivial rule 7. name, category à color, category Split/combine on 5, 6 8. name, category à price Transitivity on 3, 7

These inference rules are called Armstrong’s axioms:

ü Reflexivity: ü if X is a set of attributes of R and YÍX, then X®Y

ü Insertion (Augmentation): • if R satisfies X®Y for two sets of attributes X,Y of R, then

for every set of attributes Z in R it holds that XZ®YZ

ü Transitivity: • if X®Y and Y®Z both hold for a relation R, then X®Z also

holds

Armstrong's Axioms

For a set of dependencies F and another dependency X®Y, we say that X®Y can be deduced from F, denoted F⊢X®Y, if X®Y can be inferred from F using only Armstrong’s axioms

Armstrong's Axioms

The following inference rules follow from Armstrong’s axioms (and can be inferred from them): üUnification: ü If X®Y and X®Z both hold then X®YZ holds üSplit: ü If X®YZ holds then X®Y and X®Z both hold üPseudo-transitivity: ü If X®Y and YW®Z both hold then XW®Z holds

29

• No need for FD’s with > 1 attribute on right. – But sometimes convenient to combine FD’s as

a shorthand. – Example: name -> addr and name -> wife

becomes name -> addr wife • > 1 attribute on left may be essential!

– Example: store candy -> price

FD's with Multiple Attributes

• Given a schema R and a set of functional dependencies F: – A superkey of R is a set of attributes KÍR such that F ⊨ K®R – A key of R is a set of attributes KÍR such:

1. K is a superkey of R 2. No proper subset of K is a superkey of R

• A key is also called a minimal key or an admissible key

– As hinted before, keys are useful in identifying problematic schemas, e.g., the existence of a dependency X®Y in which X is not a key

Keys

K is a superkey for relation R if K functionally determines all of R.

Consumers(name, addr, candiesLiked, manf, favCandy) • {name, candiesLiked} is a superkey because together these

attributes determine all the other attributes. – name -> addr favCandy – candiesLiked -> manf

Example

• {name, candiesLiked} is a key because neither {name} nor {candiesLiked} is a superkey. – name doesn’t -> manf; – candiesLiked doesn’t -> addr.

• There are no other keys, but lots of superkeys. – Any superset of {name, candiesLiked}.

32

• Keys in E/R concern entities. • Keys in relations concern tuples. • Usually, one tuple corresponds to one entity,

so the ideas are the same. • But in poor relational designs, one entity can

become several tuples, so E/R keys and Relational keys are different.

E/R and Relational Keys

Relational key = {name candiesLiked} But in E/R, name is a key for Consumers, and candiesLiked is a key for Candies. Note: 2 tuples for Janeway entity and 2 tuples for Twizzlers entity.

name addr candiesLiked manf favCandy

Jane Voyager Twizzlers Hershey Smarties Jane Voyager Smarties Nestle Smarties Spock Enterprise Twizzlers Hershey Twizzlers

Example Data

Suppose schema was obtained from an E/R diagram. • If relation R came from an entity set E, then key for

R is the keys of E • If R came from a binary relationship from E1 to E2:

– many-many: key for R is the keys of E1 and E2 – many-one: key for R is the keys for E1 (but not the keys

for E2) – one-one: key for R is the keys for E1; another key for R is

the keys for E2

Discovering Keys

Consumers Candies Likes

Likes(consumer, candy) Favorite

Favorite(consumer, candy)

Married

husband wife

Married(husband, wife)

name addr name manf

Buddies

1 2

Buddies(name1, name2)

key: consumer candy

key: name1 name2

key: consumer

keys: husband or wife

Key Example

Example: “no two courses can meet in the same room at the same time” tells us: hour, room -> course Ultimately, FD's and keys come from the semantics of the application!!! • FD's and keys apply to the schema, not to

specific instantiations of the relations

More FD's from Application

• X+ : Closure of an attribute set X – The set of all attributes that are determined by X

• K: a key

– minimum set of attributes that determines all attributes

• F+ : Closure of a dependency set F – The set of all dependencies that are implied from F

• Fmin: a minimum cover of a dependency set F

– a minimum set of FDs that is equivalent to F

Some Important Concepts

• We already know that we have FD's A -> B and B -> C, then it is also true that A -> C. • What about a complete chain of such deductions? • Called dependency set closure

Closure of a Dependency Set

In general, we need to consider: • Dependency set closure (F+) • Attributes set closure (X+) • Attribute closure (A+)

F F+ f1

f2 f3

f Implies

The closures of Dependency set and Attributes sets are used to define criteria for the goodness of a decomposition

the size of F+ could be exponential in the size of F

Sets F1 and F2 of functional dependencies are considered equal if they have the same closure (i.e. F1+ = F2+)

• Given a relation schema R and a set F of FDs – is some FD logically implied by this set F?

• Example – R = {A,B,C,G,H,I} – F = {A ->B, A ->C, CG -> H, CG -> I, B -> H} – would A ->H be logically implied by F? – yes (you can prove this using the definition of FD)

Closure of F: F+ = all FDs logically implied by F • How to compute F+?

– we can use Armstrong's axioms

Closure of a Dependency Set

Given a set F of functional dependencies, any set of functional dependencies that’s equivalent to F is called a cover (basis)

– We limit the possibilities by requiring that each dependency has a single attribute on the right-hand side

– How many bases are there for a relation R with n functional dependencies in F?

mnCover (Basis)

Recall: Sets F1 and F2 of functional dependencies are considered equal if they have the same closure (i.e. F1+ = F2+)

A minimal cover (basis) for a relation R is a basis that satisfies the following conditions: • All functional dependencies in B have singleton right-hand

sides • If any functional dependency is removed from B, the result

is no longer a basis • If any left-hand side attribute is removed from a functional

dependency of B, the result is no longer a basis

Minimal cover/basis (Minimal set of FDs)

• Every set of FDs has an equivalent minimal set • There can be several equivalent minimal sets • There is no simple algorithm for computing a minimal set of FDs that is

equivalent to a set F of FDs

The closure of an attributes set has an important role in the design of relational schemas

– For example, K is a superkey of R iff K+=R – Thus, we need an algorithm to compute it.

Closure of an Attributes Set

A key K for a given relation R is a minimal set of attributes A1, A2, ..., An such that closure K+={A1, A2, ..., An}+ is the set of all attributes of R

Given a set of attributes X={A1, …, An} and a set of FDs F: The closure, X+ is the set of attributes B s.t. {A1, …, An} à B

In English: Closure of a set of attributes X with respect to F is the set X + of all attributes that are functionally determined by X

The following simple algorithm computes the closure XF

+ for a given dependency set F and attributes set X

AList := X

Repeat

For every Y®ZÎF do

If YÍAList then

AList := AListÈZ

Until no change to AList

Return AList

Closure of an Attributes Set

Compute the closure of X={A,B} for the dependency set F={A®C, BC®A, AC®D, CE®F }

– Initialization: AList={A,B}

– From A®C, we get AList={A,B,C}

– From AC®D, we get AList={A,B,C,D}

– The other dependencies do not add anything

– The closure of an attributes set XF+ = { A,B,C,D }

Execution Example

Suppose R(A,B,C,D,E,F) and the FD's are: F={AB ® C, BC ® AD, D ® E, CF ® B} We wish to test whether AB ® D follows from F We compute {A,B}+ which is {A,B,C,D,E}. Since D is a member of the closure, we conclude that it follows.

Closure of an Attributes Set

How to find if a set of attributes X is a superkey / key? • Compute X+ • If X+ = all attributes, then X is a superkey • Consider minimal superkeys (called keys) • keys are also called Candidate Keys (CK)

Note: there can be exponentially many candidate keys!

Computing Superkeys Keys

Definitions: • Prime Attribute - Attribute A of R that is Member of some Candidate Key X • Non-Prime Attribute - An Attribute that is not a Member of any Candidate Key X

K is a superkey for relation R if K functionally determines all of R.

R = Product(name, price, category, color)

What is the key ? Let's try (name, category) as a key and compute a closure (name, category) + = (name, category, price, color)

Hence (name, category) is a key

Computing Keys

name, category à price category à color

= the set R of all attributes Ø this is a superkey Ø this is a key, since neither name

nor category alone is a superkey

When a database is poorly designed we get anomalies: Redundancy: data is repeated Updated anomalies: need to change in several places Delete anomalies: may lose data when we don’t want Schema refinement means removing the data anomalies.

Data Anomalies

1. Functional Dependency a) Amstrong’s axioms b) Attribute closure (A+) c) Dependency closure (F+) d) Minimum cover (Fmin)

2. Normal Forms a) BCNF b) 3NF

3. Decomposition a) Lossless join b) Dependency preserving

Conceptual design

Schemas Integrity Constraints

Normalization Review

2NF

3NF

BCNF (Boyce Codd)

1NF Eliminate the Partial Functional Dependencies of Non-prime Attributes to Key Attributes

Eliminate the Transitive Functional Dependencies of Non-prime Attributes to Key Attributes

Eliminate the Partial and Transitive Functional Dependencies of Prime (Key) Attributes to Key

Lossless Decomposition but not Dependency Preserving

Lossless Decomposition and Dependency Preserving

“Lousy Tables”

Remove Composite/ Multi-Value Attributes

“Wonderful Tables”

Normalization Review

R1 = projection of R on A1, ..., An, B1, ..., Bm R2 = projection of R on A1, ..., An, C1, ..., Cp

R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)

R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)

Decompositions in General

• Can we get the data back correctly ? – Lossless decomposition – Lossy decomposition

• Can we recover the FD’s on the ‘big’ table from the FD’s on the small tables ? – Dependency-preserving decomposition

Problems with Decomposition

• Correct decomposition:

Name Price Category

Gizmo 19.99 Gadget

OneClick 24.99 Camera

Gizmo 19.99 Camera

Name Price

Gizmo 19.99

OneClick 24.99

Gizmo 19.99

Name Category

Gizmo Gadget

OneClick Camera

Gizmo Camera

Lossless Decomposition

• Incorrect decomposition: Name Price Category

Gizmo 19.99 Gadget

OneClick 24.99 Camera

Gizmo 19.99 Camera Name Category

Gizmo Gadget

OneClick Camera

Gizmo Camera

Price Category

19.99 Gadget

24.99 Camera

19.99 Camera

What’s wrong ??

Lossy Decomposition

Name Price Category

Gizmo 19.99 Gadget

OneClick 24.99 Camera

OneClick 19.99 Camera

Gizmo 24.99 Camera

Gizmo 19.99 Camera

R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)

Theorem If A1, ..., An à B1, ..., Bm Then the decomposition is lossless

R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)

Example: name à price, hence the first decomposition is lossless

Note: don’t need necessarily A1, ..., An à C1, ..., Cp

Decompositions in General

R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)

R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)

Correct Decomposition

A decomposition is lossless if we can recover:

Decompose

Recover

R'(A1, ..., An, B1, ..., Bm, C1, ..., Cp) == R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)

R' is in general larger than R. Must ensure R' = R

When decomposing a relation R, we want the decomposition to

– minimize redundancy – avoid loss of information – preserve functional dependencies (i.e., constraints) – ensure good query performance

These objectives can be conflicting! • Boyce-Codd normal form achieves some of these

Decomposition Goals

• Decomposition into Boyce Codd Normal Form (BCNF) – Losselss

• Decomposition into 3rd Normal Form (3NF)

– Lossless – Dependency preserving

Normal Forms

3NF

BCNF

4NF

In practice: - Aim for BNCF - Settle for 3NF (it is enough) - don’t overdo it!

A simple condition for removing anomalies from relations:

Equivalently: for any set of attributes X = { A1, ..., An } , either X+ = X or X+ = all attributes in R

A relation R is in BCNF if and only if:

If there is a non-trivial dependency A1, ..., An à B in R, then {A1, ..., An} is a superkey for R

Boyce-Codd Normal Form

In English:

Whenever a set of attributes of R is determining another attribute, should determine all the attributes of R.

Consider the relation: X→A “X →A” Þ The 2nd tuple also has y2 in the third column ( redundancy ! ) Such a situation cannot arise in a BCNF relation: BCNF Þ X must be a key Þ we must have X→Y

Þ we must have “y1 = y2” (1) X→A Þ The two tuples have the same value for A (2) (1) & (2) Þ The two tuples are identical

Þ This situation cannot happen in a relation

X Y A x y1 y2 x y2 ?

Should be y2

Redundancy! Not in BCNF

BCNF is Desirable

Input: relation R + FDs for R Output: decomposition of R into BCNF relations with “lossless join”

Compute keys for R Repeat until all relations are in BCNF (no BCNF violations): Pick any R’ with A ® B that violates BCNF Decompose R’ into R1(A, B) and R2(A, rest) Compute FDs for R1 and R2 Compute keys for R1 and R2

BCNF Decomposition Algorithm

A’s rest B’s

R1 R2

Note: need to compute the FDs on R1, R2

Decompose:

A relation is in BCNF Þ every entry records a piece of information that cannot

be inferred (using only FDs) from the other entries in the relation instance

Þ No redundant information !

A relation R(ABC) • B→C: The value of B determines C, and the value of C

can be inferred from another tuple with the same B value Þ redundancy ! (not BCNF)

• A→BC: Although the value of A determines the values of B and C, we cannot infer their values from other tuples because no two tuples in R have the same value for A Þ no redundancy ! (BCNF)

BCNF is Desirable

FD: SSN à Name, City Key: {SSN, PhoneNumber} Is it in BCNF?

Name SSN PhoneNumber City

Fred 123-45-6789 206-555-1234 Seattle

Fred 123-45-6789 206-555-6543 Seattle

Joe 987-65-4321 908-555-2121 Westfield

Joe 987-65-4321 908-555-1234 Westfield

SSN+ ={SSN, Name, City} but no PhoneNumber

BCNF Example

NOT IN BNCF!!

Name SSN City Fred 123-45-6789 Seattle Joe 987-65-4321 Westfield

SSN PhoneNumber 123-45-6789 206-555-1234 123-45-6789 206-555-6543 987-65-4321 908-555-2121 987-65-4321 908-555-1234

SSN à Name, City

Let’s check anomalies: • Redundancy ? • Update ? • Delete ?

BCNF Example

IN BNCF!!

This FD is now good because it is the key

R(A,B,C,D,E) {A}+ = {A,B,C,D} ≠ {A,B,C,D,E}

R1(A,B,C,D) {C}+ = {C,D} ≠ {A,B,C,D}

R2(A,E) R11(C,D) R12(A,B,C)

{A} à {B,C} {C} à {D}

BCNF decomposition Example

Movie Bookings § Title name of movie § Theater, name of theaters showing the movie § City

FDs (not booking a movie into two theaters in the same city) : § Theater à city § Title,city à theater

Keys? Check for closure § {title, city}+ = R § {theatre, title}+ = R § {theatre,city}+ doesn't include title

Theaterà city violates BCNF

Theatre City title

BCNF: Dependency Preservation

Why city or theater are not keys?

Lets decomposed the table based on that violation

{theater, city} and {theater, title} This decomposition cannot handle Title,city à theater

Theatre city

guild Menlo Park

Theatre title

guild Antz

Theatre City title

guild M P Antz

Park M P Antz

Theatre City title

guild M P Antz

Theatre city

guild Menlo Park

Park Menlo Park

Theatre title

guild Antz

Park Antz

BCNF: Dependency Preservation

Tradeoff between BNCF and Dependency preservation

There is a BCNF violation, and we decompose.

No FDs

In BCNF we lost the FD: Company, Product à Unit

BCNF and Dependencies

R = (Unit, Company, Product) Unit à Company Company, Product à Unit

R1 = (Unit, Company) Unit à Company

R2 = (Unit, Product)

A simple condition for removing anomalies from relations:

Solution: 3rd Normal Form (3NF)

A relation R is in 3NF if and only if:

If there is a non-trivial dependency A1, ..., An à B in R, then {A1, ..., An} is a superkey for R,

or B is part of a key

This comes from BCNF

Informally: everything depends on the key or is in the key

The algorithm is complicated 1. Get a “minimal cover” of FDs 2. Find a lossless-join decomposition of R

(which might miss dependencies) 3. Add additional relations to the

decomposition to cover any missing FDs of the cover

• Result will be lossless, will be dependency-preserving 3NF; might not be BCNF

Decomposition into 3NF

• 3NF decomposition v.s. BCNF decomposition: – Use same decomposition steps, for a while – 3NF may stop decomposing, while BCNF continues

• Tradeoffs

– BCNF = no anomalies, but may lose some FDs – 3NF = keeps all FDs, but may have some anomalies

3NF and BCNF

• Consider the TEACH Relation: • TEACH(STUDENT, COURSE, INSTRUCTOR) • in 3NF but not in BCNF with

– FD1: {STUDENT, COURSE} à INSTRUCTOR – FD2: INSTRUCTOR à COURSE

• 3 Possible BNCF Decompositions of TEACH: – R1(STUDENT, INSTRUCTOR), R2(STUDENT, COURSE) – R1(COURSE, INSTRUCTOR), R2(COURSE, STUDENT) – R1(INSTRUCTOR, COURSE), R2 (INSTRUCTOR, STUDENT)

• All Three decompositions "lose" FD1! • 3NF is best since after Join, Recaptures FD1 and doesn’t

generate any spurious Tuples!

3NF and BCNF

Relational schemas may include additional dependency types

• Multivalued dependencies – Here, there may be multiple different values of Y for the same

values of X, yet the values of X fix the set of values for Y • Inclusion dependencies

– This types of dependency relates between the values of attributes in two relations in the schema

– E.g., in the train operation pS_Name (Serves) Í pS_Name (Station)

In this course, we focus on design considerations that follow from functional dependencies only

Additional Dependency Types

The multi-valued dependencies are: StudentID ->-> Phone Number StudentID ->-> Course

Multi-valued Dependencies (MVD)

Student ID Phone Number Course

981949 (519) 222-3344 CS4411

981949 (226) 231-1111 CS4411

981949 (519) 222-3344 CS3336

981949 (226) 231-1111 CS3336

• BCNF eliminates redundancy in each tuple but may leave redundancy among tuples in a relationship

• This typically happens if two many-many relationships (or in general: a combination of two types of facts) are represented in one relation

• Every street address is given 3 times and every title is repeated twice • This table does not violate BCNF but has redundancy among tuples.

MVD and Attribute Independence

Definition of Multi-valued Dependencies

Given R(A1,…,An,B1,…,Bm,C1,…,Cp) the MVD A1,…,An ->-> B1,…,Bm holds if: for any values of A1,…,An the “set of values” of B1,…,Bm is “independent” of those of C1,…,Cp

MVD and Attribute Independence

Exercises

Exercises

R(A,B,C,D,E,F) A → B B → A {B,C} → D C → E

B and C in this relation is a composite key that together determine D. Recall that a Candidate Key (CK) is one or more attributes that with which you can determine all other attributes in the relation. (In the end, one of the CK:s will be chosen to be the primary key.)

A candidate key is a minimal super key. Minimal means that you can’t remove any attribute from the key and still determine all other attributes.

The non-primary attributes (NP) are simply all attributes that are not part of any CK.

Exercises

R(A,B,C,D,E,F) A → B B → A {B,C} → D C → E

B and C in this relation is a composite key that together determine D. Is the relation in 2NF? Is there a composite CK? If not, 2NF is automatically achieved. Otherwise, check your boxes. In any of your composite CKs: does a part of it (i.e. a box inside the composite box) by itself determine (point to) a non-prime attribute? If yes, NF2 does not pass (and the relation is thus in 1NF). Is the relation in 3NF? Does the relation have any non-primes? If no, the relation is automatically in 3NF! Otherwise: Is there a non-prime that determines (point to) another non-prime (resulting in a transitive dependency because the last non-prime is determined indirectly)? If yes, 3NF does not pass (and the relation is thus in 2NF). Is the relation in BCNF? Are all the determinants also candidate keys? If yes, BCNF pass. This is easiest to check by looking at the dependencies you were given and check that each key (left hand side of the arrow) is in your list of CKs

violates 2NF

Exercises

R(A,B,C,D,E) A → {B,C} {B,C} → A,D D → E

3NF say that a non-prime must not determine another non-prime. Here is a problem: D determines E. Thus 3NF does not pass, and the relation only achieves 2NF!

Exercises

R(A,B,C,D) with FDs: AB->C, C->D, and D->A Indicate all BNCF violations CàA, CàD, DàA, ABàD, ABà C, ACàD, BCàA, BCàD, BDàA, BDàC, CDàA, ABCàD, ABDàC, and BCDàA

keys are AB, BC, and BD Decompose into: (ACD), (BC) Then into: (AD), (CD), (BC)

A C

D

Thus, any dependency above that does not have one of these pairs on the left is a BCNF violation

B

Exercises

R(A,B,C,D) with FDs: AB->C, BC->D, and CD->A, AD->B Indicate all BNCF violations

ABàC, BCàD, CDàA, ADàB, ABàD, ADàC, BCàA, CDàB, ABCàD, ABDàC, ACDàB and BCDàA

keys are AB, AD, BC, CD In BNCF form!

C

D

A B

Recommended