83
Lecture 5 CS4411: Databases II

Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

Lecture 5

CS4411: Databases II

Page 2: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

• Execrcises from Relational Algebra • Functional Dependencies

Agenda

Page 3: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

Relational Algebra: Exercises

Page 4: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

§ Select 𝜎<𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑠𝑠𝑠𝑐𝑠𝑠𝑠𝑠𝑠> 𝑅 • Project 𝜋<𝑎𝑠𝑠𝑎𝑠𝑎𝑎𝑠𝑠 𝑠𝑠𝑠𝑠> 𝑅 • Rename 𝜌<𝑠𝑠𝑛 𝑠𝑠𝑠𝑠𝑠𝑎> 𝑅 § Union 𝑅 ∪ 𝑆 § Intersection 𝑅 ∩ 𝑆 • Difference 𝑅 − 𝑆 • Cross product 𝑅 × 𝑆 • Join R ⋈<𝑗𝑠𝑠𝑠 𝑠𝑠𝑠𝑐𝑠𝑠𝑠𝑠𝑠> 𝑆 • Natural join 𝑅 ⋈ 𝑆 • Division 𝑅 ÷ 𝑆

Review of Operators

Page 5: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

Exercises

Suppose relations R and S have n tuples and m tuples respectively. Give the minimum and maximum number of tuples that the results of the following expressions can have:

• Union: 𝑅 ∪ 𝑆

If all the tuples of R and S are different, then the union has maximum:

n + m tuples.

The minimum number of tuples that can appear in the result occurs if every tuple of one relation also appears in the other. Then the union has:

max(n, m) tuples.

Page 6: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

Exercises

Suppose relations R and S have n tuples and m tuples respectively. Give the minimum and maximum number of tuples that the results of the following expressions can have:

• Natural join: 𝑅 ⋈ 𝑆

If all the tuples in one relation can pair successfully with all the tuples in the other relation, then the natural join has:

n * m tuples.

The minimum number of tuples that can appear in the result occurs if none of the tuples of one relation can pair successfully with all the tuples in the other relation. Then the natural join has:

zero tuples.

Page 7: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

Exercises

Suppose relations R and S have n tuples and m tuples respectively. Give the minimum and maximum number of tuples that the results of the following expressions can have:

• 𝜎𝐶 𝑅 x S

If the condition C brings back all the tuples of R, then the cross product will contain: n * m tuples. If the condition C brings back none of the tuples of R, then the cross product will contain: zero tuples.

Page 8: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

Exercises

We define r as the schema of R and s as the schema of S:

(𝑅 ⋈ 𝑆) πr

δ(πr∩s(S)) where δ is the duplicate-elimination operator 𝑅 ⋈

R –(R – πr(𝑅 ⋈ 𝑆))

Page 9: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

Exercises

πr(𝑅 ⋈ 𝑆) R –

We define r as the schema of R

Page 10: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

Exercises

• No manufacturer of PC's may also make laptops

πmaker(σtype = laptop(Product)) ∩

πmaker(σtype = pc(Product)) = ø

Page 11: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

Functional Dependencies

Page 12: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

• Set of relation schemas obtained by translating from an E/R diagram might need improvement – modeling with E/R diagrams is an art, not a science; relies

on experience and intuition – multiple alternative designs are possible, how to choose?

• Problems caused by redundant storage of info – wasted space – anomalies when updating, inserting or deleting tuples

Basic idea: replace a relation schema with a collection of "smaller" schemas. This is called decomposition.

Improving Relation Schemas

Page 13: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

• There is a theory for systematically guiding the improvement of relational designs, called normalization

• Normalization uses the notion of integrity constraints (ICs) on the information – functional dependencies – multi-valued dependencies – referential integrity constraints

Improving Relation Schemas

Page 14: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

A database schema is in First Normal Form if all tables are flat

Name GPA Courses

Alice 3.8

Bob 3.7

Carol 3.9

Math

DB

OS

DB

OS

Math

OS

Student Name GPA

Alice 3.8

Bob 3.7

Carol 3.9

Student

Course

Math

DB

OS Student Course

Alice Math

Carol Math

Alice DB

Bob DB

Alice OS

Carol OS

Takes

Course

May need to add keys

First Normal Form (1NF)

Page 15: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

• Based on Functional Dependencies – 2nd Normal Form (obsolete) – 3rd Normal Form – Boyce Codd Normal Form (BCNF)

• Based on Multivalued Dependencies – 4th Normal Form

• Based on Join Dependencies – 5th Normal Form

Discuss next

Normal Forms

Page 16: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

• A form of constraint (hence, part of the schema) • Finding them is part of the database design

Functional Dependencies

Functional Dependency:

A1, A2, …, An à B1, B2, …, Bm

Meaning: If two tuples agree on the attributes

then they must also agree on the attributes A1, A2, …, An

B1, B2, …, Bm

We say "functionally determine"

Page 17: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

Definition: A1, ..., An à B1, ..., Bm holds in R if: "t, t’ Î R, (t.A1=t’.A1 Ù ... Ù t.An=t’.An Þ t.B1=t’.B1 Ù ... Ù t.Bm=t’.Bm )

A1 ... An B1 ... Bm

if t, t’ agree here

then t, t’ agree here

t

t’

R

Functional Dependencies

Page 18: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

• This relation satisfies C→T but does not satisfy S→T

• In general, a schema in which for a given relation R there are two properties A,B such that A→B but A is not a key is problematic!

S C T

Bart Math Mrs. Krabappel

Lisa Math Mrs. Krabappel

Lisa Logic Ms. Hoover

Example

Page 19: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

• EmpID à Name, Phone, Position • Position à Phone • but Phone à Position

EmplID Name Phone Position

E0045 Smith 1234 Clerk

E1881 John 9876 SalesRep

E1111 Smith 9876 SalesRep

E9999 Mary 1234 Lawyer

Example

Page 20: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

FD’s are constraints: • On some instances they hold • On others they don’t

name category color department price

Gizmo Gadget Green Toys 49

Tweaker Gadget Green Toys 99

Does this instance satisfy all the FDs ?

name à color category à department color, category à price

Example

Consider these FDs:

Product(name, category, color, department, price)

No: color, category à price doesn’t hold

Page 21: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

FD’s are constraints: • On some instances they hold • On others they don’t

Does this instance satisfy all the FDs ?

name à color category à department color, category à price

Example

Consider these FDs:

Product(name, category, color, department, price)

name category color department price

Gizmo Gadget Green Toys 49

Tweaker Gadget Black Toys 99

Gizmo Stationary Green Office-supp. 59

Yes!

Page 22: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

If some FDs are satisfied, then others are satisfied too

If all these FDs are true: name à color category à department color, category à price

Then this FD also holds: name, category à price

Why ?? We say that the new FD is implied

Inference

Page 23: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

Is equivalent to

Splitting rule and Combing rule

A1 ... An B1 ... Bm

A1, A2, …, An à B1, B2, …, Bm

A1, A2, …, An à B1 A1, A2, …, An à B2 . . . . . A1, A2, …, An à Bm

Inference

Page 24: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

Trivial Rule

A1 … Am where i = 1, 2, ..., n

A1, A2, …, An à Ai

Inference

Ai à Ai

Page 25: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

Transitive Closure Rule

If A1, A2, …, An à B1, B2, …, Bm

and B1, B2, …, Bm à C1, C2, …, Cp

then A1, A2, …, An à C1, C2, …, Cp

A1 … An B1 … Bm C1 ... Cp

Inference

Page 26: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

From: 1. name à color 2. category à department 3. color, category à price

name, category à price To:

Example

Inferred FD Which Rule did we apply ?

4. name, category à name Trivial rule 5. name, category à color Transitivity on 4, 1 6. name, category à category Trivial rule 7. name, category à color, category Split/combine on 5, 6 8. name, category à price Transitivity on 3, 7

Page 27: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

These inference rules are called Armstrong’s axioms:

ü Reflexivity: ü if X is a set of attributes of R and YÍX, then X®Y

ü Insertion (Augmentation): • if R satisfies X®Y for two sets of attributes X,Y of R, then

for every set of attributes Z in R it holds that XZ®YZ

ü Transitivity: • if X®Y and Y®Z both hold for a relation R, then X®Z also

holds

Armstrong's Axioms

Page 28: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

For a set of dependencies F and another dependency X®Y, we say that X®Y can be deduced from F, denoted F⊢X®Y, if X®Y can be inferred from F using only Armstrong’s axioms

Armstrong's Axioms

The following inference rules follow from Armstrong’s axioms (and can be inferred from them): üUnification: ü If X®Y and X®Z both hold then X®YZ holds üSplit: ü If X®YZ holds then X®Y and X®Z both hold üPseudo-transitivity: ü If X®Y and YW®Z both hold then XW®Z holds

Page 29: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

29

• No need for FD’s with > 1 attribute on right. – But sometimes convenient to combine FD’s as

a shorthand. – Example: name -> addr and name -> wife

becomes name -> addr wife • > 1 attribute on left may be essential!

– Example: store candy -> price

FD's with Multiple Attributes

Page 30: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

• Given a schema R and a set of functional dependencies F: – A superkey of R is a set of attributes KÍR such that F ⊨ K®R – A key of R is a set of attributes KÍR such:

1. K is a superkey of R 2. No proper subset of K is a superkey of R

• A key is also called a minimal key or an admissible key

– As hinted before, keys are useful in identifying problematic schemas, e.g., the existence of a dependency X®Y in which X is not a key

Keys

K is a superkey for relation R if K functionally determines all of R.

Page 31: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

Consumers(name, addr, candiesLiked, manf, favCandy) • {name, candiesLiked} is a superkey because together these

attributes determine all the other attributes. – name -> addr favCandy – candiesLiked -> manf

Example

• {name, candiesLiked} is a key because neither {name} nor {candiesLiked} is a superkey. – name doesn’t -> manf; – candiesLiked doesn’t -> addr.

• There are no other keys, but lots of superkeys. – Any superset of {name, candiesLiked}.

Page 32: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

32

• Keys in E/R concern entities. • Keys in relations concern tuples. • Usually, one tuple corresponds to one entity,

so the ideas are the same. • But in poor relational designs, one entity can

become several tuples, so E/R keys and Relational keys are different.

E/R and Relational Keys

Page 33: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

Relational key = {name candiesLiked} But in E/R, name is a key for Consumers, and candiesLiked is a key for Candies. Note: 2 tuples for Janeway entity and 2 tuples for Twizzlers entity.

name addr candiesLiked manf favCandy

Jane Voyager Twizzlers Hershey Smarties Jane Voyager Smarties Nestle Smarties Spock Enterprise Twizzlers Hershey Twizzlers

Example Data

Page 34: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

Suppose schema was obtained from an E/R diagram. • If relation R came from an entity set E, then key for

R is the keys of E • If R came from a binary relationship from E1 to E2:

– many-many: key for R is the keys of E1 and E2 – many-one: key for R is the keys for E1 (but not the keys

for E2) – one-one: key for R is the keys for E1; another key for R is

the keys for E2

Discovering Keys

Page 35: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

Consumers Candies Likes

Likes(consumer, candy) Favorite

Favorite(consumer, candy)

Married

husband wife

Married(husband, wife)

name addr name manf

Buddies

1 2

Buddies(name1, name2)

key: consumer candy

key: name1 name2

key: consumer

keys: husband or wife

Key Example

Page 36: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

Example: “no two courses can meet in the same room at the same time” tells us: hour, room -> course Ultimately, FD's and keys come from the semantics of the application!!! • FD's and keys apply to the schema, not to

specific instantiations of the relations

More FD's from Application

Page 37: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

• X+ : Closure of an attribute set X – The set of all attributes that are determined by X

• K: a key

– minimum set of attributes that determines all attributes

• F+ : Closure of a dependency set F – The set of all dependencies that are implied from F

• Fmin: a minimum cover of a dependency set F

– a minimum set of FDs that is equivalent to F

Some Important Concepts

Page 38: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

• We already know that we have FD's A -> B and B -> C, then it is also true that A -> C. • What about a complete chain of such deductions? • Called dependency set closure

Closure of a Dependency Set

In general, we need to consider: • Dependency set closure (F+) • Attributes set closure (X+) • Attribute closure (A+)

F F+ f1

f2 f3

f Implies

The closures of Dependency set and Attributes sets are used to define criteria for the goodness of a decomposition

the size of F+ could be exponential in the size of F

Sets F1 and F2 of functional dependencies are considered equal if they have the same closure (i.e. F1+ = F2+)

Page 39: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

• Given a relation schema R and a set F of FDs – is some FD logically implied by this set F?

• Example – R = {A,B,C,G,H,I} – F = {A ->B, A ->C, CG -> H, CG -> I, B -> H} – would A ->H be logically implied by F? – yes (you can prove this using the definition of FD)

Closure of F: F+ = all FDs logically implied by F • How to compute F+?

– we can use Armstrong's axioms

Closure of a Dependency Set

Page 40: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

Given a set F of functional dependencies, any set of functional dependencies that’s equivalent to F is called a cover (basis)

– We limit the possibilities by requiring that each dependency has a single attribute on the right-hand side

– How many bases are there for a relation R with n functional dependencies in F?

mnCover (Basis)

Recall: Sets F1 and F2 of functional dependencies are considered equal if they have the same closure (i.e. F1+ = F2+)

Page 41: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

A minimal cover (basis) for a relation R is a basis that satisfies the following conditions: • All functional dependencies in B have singleton right-hand

sides • If any functional dependency is removed from B, the result

is no longer a basis • If any left-hand side attribute is removed from a functional

dependency of B, the result is no longer a basis

Minimal cover/basis (Minimal set of FDs)

• Every set of FDs has an equivalent minimal set • There can be several equivalent minimal sets • There is no simple algorithm for computing a minimal set of FDs that is

equivalent to a set F of FDs

Page 42: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

The closure of an attributes set has an important role in the design of relational schemas

– For example, K is a superkey of R iff K+=R – Thus, we need an algorithm to compute it.

Closure of an Attributes Set

A key K for a given relation R is a minimal set of attributes A1, A2, ..., An such that closure K+={A1, A2, ..., An}+ is the set of all attributes of R

Given a set of attributes X={A1, …, An} and a set of FDs F: The closure, X+ is the set of attributes B s.t. {A1, …, An} à B

In English: Closure of a set of attributes X with respect to F is the set X + of all attributes that are functionally determined by X

Page 43: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

The following simple algorithm computes the closure XF

+ for a given dependency set F and attributes set X

AList := X

Repeat

For every Y®ZÎF do

If YÍAList then

AList := AListÈZ

Until no change to AList

Return AList

Closure of an Attributes Set

Page 44: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

Compute the closure of X={A,B} for the dependency set F={A®C, BC®A, AC®D, CE®F }

– Initialization: AList={A,B}

– From A®C, we get AList={A,B,C}

– From AC®D, we get AList={A,B,C,D}

– The other dependencies do not add anything

– The closure of an attributes set XF+ = { A,B,C,D }

Execution Example

Page 45: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

Suppose R(A,B,C,D,E,F) and the FD's are: F={AB ® C, BC ® AD, D ® E, CF ® B} We wish to test whether AB ® D follows from F We compute {A,B}+ which is {A,B,C,D,E}. Since D is a member of the closure, we conclude that it follows.

Closure of an Attributes Set

Page 46: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

How to find if a set of attributes X is a superkey / key? • Compute X+ • If X+ = all attributes, then X is a superkey • Consider minimal superkeys (called keys) • keys are also called Candidate Keys (CK)

Note: there can be exponentially many candidate keys!

Computing Superkeys Keys

Definitions: • Prime Attribute - Attribute A of R that is Member of some Candidate Key X • Non-Prime Attribute - An Attribute that is not a Member of any Candidate Key X

K is a superkey for relation R if K functionally determines all of R.

Page 47: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

R = Product(name, price, category, color)

What is the key ? Let's try (name, category) as a key and compute a closure (name, category) + = (name, category, price, color)

Hence (name, category) is a key

Computing Keys

name, category à price category à color

= the set R of all attributes Ø this is a superkey Ø this is a key, since neither name

nor category alone is a superkey

Page 48: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

When a database is poorly designed we get anomalies: Redundancy: data is repeated Updated anomalies: need to change in several places Delete anomalies: may lose data when we don’t want Schema refinement means removing the data anomalies.

Data Anomalies

Page 49: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

1. Functional Dependency a) Amstrong’s axioms b) Attribute closure (A+) c) Dependency closure (F+) d) Minimum cover (Fmin)

2. Normal Forms a) BCNF b) 3NF

3. Decomposition a) Lossless join b) Dependency preserving

Conceptual design

Schemas Integrity Constraints

Normalization Review

Page 50: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

2NF

3NF

BCNF (Boyce Codd)

1NF Eliminate the Partial Functional Dependencies of Non-prime Attributes to Key Attributes

Eliminate the Transitive Functional Dependencies of Non-prime Attributes to Key Attributes

Eliminate the Partial and Transitive Functional Dependencies of Prime (Key) Attributes to Key

Lossless Decomposition but not Dependency Preserving

Lossless Decomposition and Dependency Preserving

“Lousy Tables”

Remove Composite/ Multi-Value Attributes

“Wonderful Tables”

Normalization Review

Page 51: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

R1 = projection of R on A1, ..., An, B1, ..., Bm R2 = projection of R on A1, ..., An, C1, ..., Cp

R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)

R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)

Decompositions in General

Page 52: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

• Can we get the data back correctly ? – Lossless decomposition – Lossy decomposition

• Can we recover the FD’s on the ‘big’ table from the FD’s on the small tables ? – Dependency-preserving decomposition

Problems with Decomposition

Page 53: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

• Correct decomposition:

Name Price Category

Gizmo 19.99 Gadget

OneClick 24.99 Camera

Gizmo 19.99 Camera

Name Price

Gizmo 19.99

OneClick 24.99

Gizmo 19.99

Name Category

Gizmo Gadget

OneClick Camera

Gizmo Camera

Lossless Decomposition

Page 54: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

• Incorrect decomposition: Name Price Category

Gizmo 19.99 Gadget

OneClick 24.99 Camera

Gizmo 19.99 Camera Name Category

Gizmo Gadget

OneClick Camera

Gizmo Camera

Price Category

19.99 Gadget

24.99 Camera

19.99 Camera

What’s wrong ??

Lossy Decomposition

Name Price Category

Gizmo 19.99 Gadget

OneClick 24.99 Camera

OneClick 19.99 Camera

Gizmo 24.99 Camera

Gizmo 19.99 Camera

Page 55: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)

Theorem If A1, ..., An à B1, ..., Bm Then the decomposition is lossless

R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)

Example: name à price, hence the first decomposition is lossless

Note: don’t need necessarily A1, ..., An à C1, ..., Cp

Decompositions in General

Page 56: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)

R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)

Correct Decomposition

A decomposition is lossless if we can recover:

Decompose

Recover

R'(A1, ..., An, B1, ..., Bm, C1, ..., Cp) == R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)

R' is in general larger than R. Must ensure R' = R

Page 57: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

When decomposing a relation R, we want the decomposition to

– minimize redundancy – avoid loss of information – preserve functional dependencies (i.e., constraints) – ensure good query performance

These objectives can be conflicting! • Boyce-Codd normal form achieves some of these

Decomposition Goals

Page 58: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

• Decomposition into Boyce Codd Normal Form (BCNF) – Losselss

• Decomposition into 3rd Normal Form (3NF)

– Lossless – Dependency preserving

Normal Forms

3NF

BCNF

4NF

In practice: - Aim for BNCF - Settle for 3NF (it is enough) - don’t overdo it!

Page 59: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

A simple condition for removing anomalies from relations:

Equivalently: for any set of attributes X = { A1, ..., An } , either X+ = X or X+ = all attributes in R

A relation R is in BCNF if and only if:

If there is a non-trivial dependency A1, ..., An à B in R, then {A1, ..., An} is a superkey for R

Boyce-Codd Normal Form

In English:

Whenever a set of attributes of R is determining another attribute, should determine all the attributes of R.

Page 60: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

Consider the relation: X→A “X →A” Þ The 2nd tuple also has y2 in the third column ( redundancy ! ) Such a situation cannot arise in a BCNF relation: BCNF Þ X must be a key Þ we must have X→Y

Þ we must have “y1 = y2” (1) X→A Þ The two tuples have the same value for A (2) (1) & (2) Þ The two tuples are identical

Þ This situation cannot happen in a relation

X Y A x y1 y2 x y2 ?

Should be y2

Redundancy! Not in BCNF

BCNF is Desirable

Page 61: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

Input: relation R + FDs for R Output: decomposition of R into BCNF relations with “lossless join”

Compute keys for R Repeat until all relations are in BCNF (no BCNF violations): Pick any R’ with A ® B that violates BCNF Decompose R’ into R1(A, B) and R2(A, rest) Compute FDs for R1 and R2 Compute keys for R1 and R2

BCNF Decomposition Algorithm

A’s rest B’s

R1 R2

Note: need to compute the FDs on R1, R2

Decompose:

Page 62: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

A relation is in BCNF Þ every entry records a piece of information that cannot

be inferred (using only FDs) from the other entries in the relation instance

Þ No redundant information !

A relation R(ABC) • B→C: The value of B determines C, and the value of C

can be inferred from another tuple with the same B value Þ redundancy ! (not BCNF)

• A→BC: Although the value of A determines the values of B and C, we cannot infer their values from other tuples because no two tuples in R have the same value for A Þ no redundancy ! (BCNF)

BCNF is Desirable

Page 63: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

FD: SSN à Name, City Key: {SSN, PhoneNumber} Is it in BCNF?

Name SSN PhoneNumber City

Fred 123-45-6789 206-555-1234 Seattle

Fred 123-45-6789 206-555-6543 Seattle

Joe 987-65-4321 908-555-2121 Westfield

Joe 987-65-4321 908-555-1234 Westfield

SSN+ ={SSN, Name, City} but no PhoneNumber

BCNF Example

NOT IN BNCF!!

Page 64: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

Name SSN City Fred 123-45-6789 Seattle Joe 987-65-4321 Westfield

SSN PhoneNumber 123-45-6789 206-555-1234 123-45-6789 206-555-6543 987-65-4321 908-555-2121 987-65-4321 908-555-1234

SSN à Name, City

Let’s check anomalies: • Redundancy ? • Update ? • Delete ?

BCNF Example

IN BNCF!!

This FD is now good because it is the key

Page 65: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

R(A,B,C,D,E) {A}+ = {A,B,C,D} ≠ {A,B,C,D,E}

R1(A,B,C,D) {C}+ = {C,D} ≠ {A,B,C,D}

R2(A,E) R11(C,D) R12(A,B,C)

{A} à {B,C} {C} à {D}

BCNF decomposition Example

Page 66: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

Movie Bookings § Title name of movie § Theater, name of theaters showing the movie § City

FDs (not booking a movie into two theaters in the same city) : § Theater à city § Title,city à theater

Keys? Check for closure § {title, city}+ = R § {theatre, title}+ = R § {theatre,city}+ doesn't include title

Theaterà city violates BCNF

Theatre City title

BCNF: Dependency Preservation

Why city or theater are not keys?

Page 67: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

Lets decomposed the table based on that violation

{theater, city} and {theater, title} This decomposition cannot handle Title,city à theater

Theatre city

guild Menlo Park

Theatre title

guild Antz

Theatre City title

guild M P Antz

Park M P Antz

Theatre City title

guild M P Antz

Theatre city

guild Menlo Park

Park Menlo Park

Theatre title

guild Antz

Park Antz

BCNF: Dependency Preservation

Tradeoff between BNCF and Dependency preservation

Page 68: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

There is a BCNF violation, and we decompose.

No FDs

In BCNF we lost the FD: Company, Product à Unit

BCNF and Dependencies

R = (Unit, Company, Product) Unit à Company Company, Product à Unit

R1 = (Unit, Company) Unit à Company

R2 = (Unit, Product)

Page 69: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

A simple condition for removing anomalies from relations:

Solution: 3rd Normal Form (3NF)

A relation R is in 3NF if and only if:

If there is a non-trivial dependency A1, ..., An à B in R, then {A1, ..., An} is a superkey for R,

or B is part of a key

This comes from BCNF

Informally: everything depends on the key or is in the key

Page 70: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

The algorithm is complicated 1. Get a “minimal cover” of FDs 2. Find a lossless-join decomposition of R

(which might miss dependencies) 3. Add additional relations to the

decomposition to cover any missing FDs of the cover

• Result will be lossless, will be dependency-preserving 3NF; might not be BCNF

Decomposition into 3NF

Page 71: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

• 3NF decomposition v.s. BCNF decomposition: – Use same decomposition steps, for a while – 3NF may stop decomposing, while BCNF continues

• Tradeoffs

– BCNF = no anomalies, but may lose some FDs – 3NF = keeps all FDs, but may have some anomalies

3NF and BCNF

Page 72: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

• Consider the TEACH Relation: • TEACH(STUDENT, COURSE, INSTRUCTOR) • in 3NF but not in BCNF with

– FD1: {STUDENT, COURSE} à INSTRUCTOR – FD2: INSTRUCTOR à COURSE

• 3 Possible BNCF Decompositions of TEACH: – R1(STUDENT, INSTRUCTOR), R2(STUDENT, COURSE) – R1(COURSE, INSTRUCTOR), R2(COURSE, STUDENT) – R1(INSTRUCTOR, COURSE), R2 (INSTRUCTOR, STUDENT)

• All Three decompositions "lose" FD1! • 3NF is best since after Join, Recaptures FD1 and doesn’t

generate any spurious Tuples!

3NF and BCNF

Page 73: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

Relational schemas may include additional dependency types

• Multivalued dependencies – Here, there may be multiple different values of Y for the same

values of X, yet the values of X fix the set of values for Y • Inclusion dependencies

– This types of dependency relates between the values of attributes in two relations in the schema

– E.g., in the train operation pS_Name (Serves) Í pS_Name (Station)

In this course, we focus on design considerations that follow from functional dependencies only

Additional Dependency Types

Page 74: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

The multi-valued dependencies are: StudentID ->-> Phone Number StudentID ->-> Course

Multi-valued Dependencies (MVD)

Student ID Phone Number Course

981949 (519) 222-3344 CS4411

981949 (226) 231-1111 CS4411

981949 (519) 222-3344 CS3336

981949 (226) 231-1111 CS3336

Page 75: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

• BCNF eliminates redundancy in each tuple but may leave redundancy among tuples in a relationship

• This typically happens if two many-many relationships (or in general: a combination of two types of facts) are represented in one relation

• Every street address is given 3 times and every title is repeated twice • This table does not violate BCNF but has redundancy among tuples.

MVD and Attribute Independence

Page 76: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

Definition of Multi-valued Dependencies

Given R(A1,…,An,B1,…,Bm,C1,…,Cp) the MVD A1,…,An ->-> B1,…,Bm holds if: for any values of A1,…,An the “set of values” of B1,…,Bm is “independent” of those of C1,…,Cp

Page 77: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

MVD and Attribute Independence

Page 78: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

Exercises

Page 79: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

Exercises

R(A,B,C,D,E,F) A → B B → A {B,C} → D C → E

B and C in this relation is a composite key that together determine D. Recall that a Candidate Key (CK) is one or more attributes that with which you can determine all other attributes in the relation. (In the end, one of the CK:s will be chosen to be the primary key.)

A candidate key is a minimal super key. Minimal means that you can’t remove any attribute from the key and still determine all other attributes.

The non-primary attributes (NP) are simply all attributes that are not part of any CK.

Page 80: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

Exercises

R(A,B,C,D,E,F) A → B B → A {B,C} → D C → E

B and C in this relation is a composite key that together determine D. Is the relation in 2NF? Is there a composite CK? If not, 2NF is automatically achieved. Otherwise, check your boxes. In any of your composite CKs: does a part of it (i.e. a box inside the composite box) by itself determine (point to) a non-prime attribute? If yes, NF2 does not pass (and the relation is thus in 1NF). Is the relation in 3NF? Does the relation have any non-primes? If no, the relation is automatically in 3NF! Otherwise: Is there a non-prime that determines (point to) another non-prime (resulting in a transitive dependency because the last non-prime is determined indirectly)? If yes, 3NF does not pass (and the relation is thus in 2NF). Is the relation in BCNF? Are all the determinants also candidate keys? If yes, BCNF pass. This is easiest to check by looking at the dependencies you were given and check that each key (left hand side of the arrow) is in your list of CKs

violates 2NF

Page 81: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

Exercises

R(A,B,C,D,E) A → {B,C} {B,C} → A,D D → E

3NF say that a non-prime must not determine another non-prime. Here is a problem: D determines E. Thus 3NF does not pass, and the relation only achieves 2NF!

Page 82: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

Exercises

R(A,B,C,D) with FDs: AB->C, C->D, and D->A Indicate all BNCF violations CàA, CàD, DàA, ABàD, ABà C, ACàD, BCàA, BCàD, BDàA, BDàC, CDàA, ABCàD, ABDàC, and BCDàA

keys are AB, BC, and BD Decompose into: (ACD), (BC) Then into: (AD), (CD), (BC)

A C

D

Thus, any dependency above that does not have one of these pairs on the left is a BCNF violation

B

Page 83: Lecture 5 - babanski.com · type = pc (Product)) = ø . Functional Dependencies • Set of relation schemas obtained by translating from an E/R diagram might need improvement –

Exercises

R(A,B,C,D) with FDs: AB->C, BC->D, and CD->A, AD->B Indicate all BNCF violations

ABàC, BCàD, CDàA, ADàB, ABàD, ADàC, BCàA, CDàB, ABCàD, ABDàC, ACDàB and BCDàA

keys are AB, AD, BC, CD In BNCF form!

C

D

A B