33
CS 405G: Introduction to Database Systems 16. Functional Dependency

CS 405G: Introduction to Database Systems 16. Functional Dependency

Embed Size (px)

Citation preview

Page 1: CS 405G: Introduction to Database Systems 16. Functional Dependency

CS 405G: Introduction to Database Systems

16. Functional Dependency

Page 2: CS 405G: Introduction to Database Systems 16. Functional Dependency

04/19/23 Chen Qian @ Univ of Kentucky

Today’s Topic

Functional Dependency. Normalization Decomposition BCNF

Page 3: CS 405G: Introduction to Database Systems 16. Functional Dependency

04/19/23 Chen Qian @ Univ of Kentucky

Motivation

How do we tell if a design is bad, e.g.,Enroll(SID, Sname, CID, Cname, grade)? This design has redundancy, because the name of an employee

is recorded multiple times, once for each project the employee is taking

SID CID Sname Cname grade

1234 10 John Smith DB A

1123 9 Ben Liu NET A

1234 9 John Smith NET B

1123 10 Ben Liu DB C

1023 10 Susan Sidhuk DB B

Page 4: CS 405G: Introduction to Database Systems 16. Functional Dependency

04/19/23 Chen Qian @ Univ of Kentucky

SID Sname

1234 John Smith

1123 Ben Liu

1023 Susan Sidhuk

SID CID grade

1234 10 A

1123 9 A

1234 9 B

1123 10 C

1023 10 B

CID Cname

9 NET

10 DB

Page 5: CS 405G: Introduction to Database Systems 16. Functional Dependency

04/19/23 Chen Qian @ Univ of Kentucky

Why redundancy is bad?

Waste disk space. What if we want to perform update operations to the relation

INSERT an new project that no employee has been assigned to it yet.

UPDATE the name of “John Smith” to “John L. Smith” DELETE the last employee who works for a certain

projectSID CID Sname Cname grade

1234 10 John Smith DB A

1123 9 Ben Liu NET A

1234 9 John Smith NET B

1123 10 Ben Liu DB C

1023 10 Susan Sidhuk DB B

Page 6: CS 405G: Introduction to Database Systems 16. Functional Dependency

04/19/23 Chen Qian @ Univ of Kentucky

Functional dependencies

A functional dependency (FD) has the form X -> Y, where X and Y are sets of attributes in a relation R

X -> Y means that whenever two tuples in R agree on all the attributes in X, they must also agree on all attributes in Y t1[X] = t2[X] t1[Y] = t2[Y]

X Y Z

a b c

a ? ?

X Y Z

a b c

a b ?

Must be “b”

Could be anything, e.g. d

X Y Z

a b c

a b d

Page 7: CS 405G: Introduction to Database Systems 16. Functional Dependency

04/19/23 Chen Qian @ Univ of Kentucky

FD examples

Address (street_address, city, state, zip) street_address, city, state -> zip zip -> city, state zip, state -> zip?

This is a trivial FD Trivial FD: LHS RHS

zip -> state, zip? This is non-trivial, but not completely non-trivial Completely non-trivial FD: LHS ∩ RHS = ?

Page 8: CS 405G: Introduction to Database Systems 16. Functional Dependency

Functional Dependencies

An FD is a property of the attributes in the schema R 

The constraint must hold on every relation instance r(R)

 If K is a key of R, then K functionally determines all attributes in R (since we never have two distinct tuples with t1[K]=t2[K])

04/19/23 Chen Qian @ Univ of Kentucky

Page 9: CS 405G: Introduction to Database Systems 16. Functional Dependency

04/19/23 Chen Qian @ Univ of Kentucky

Keys redefined using FD’s

Let attr(R) be the set of all attributes of R, a set of attributes K is a (candidate) key for a relation R if

K -> attr(R) - K, and That is, K is a “super key”

No proper subset of K satisfies the above condition That is, K is minimal (full functional dependent)

Address (street_address, city, state, zip) {street_address, city, state, zip} {street_address, city, zip} {street_address, zip} {zip}

Super key

Super key

Key

Non-key

Page 10: CS 405G: Introduction to Database Systems 16. Functional Dependency

04/19/23 Chen Qian @ Univ of Kentucky

Reasoning with FDs

Given a relation R and a set of FDs F Does another FD follow from F?

Are some of the FDs in F redundant (i.e., they follow from the others)?

Is K a key of R? What are all the keys of R?

Page 11: CS 405G: Introduction to Database Systems 16. Functional Dependency

04/19/23 Chen Qian @ Univ of Kentucky

Attribute closure

Given R, a set of FDs F that hold in R, and a set of attributes Z in R:The closure of Z (denoted Z+) with respect to F is the set of all attributes {A1, A2, …} functionally determined by Z (that is, Z -> A1 A2 …)

Algorithm for computing the closure Start with closure = Z If X -> Y is in F and X is already in the closure, then also

add Y to the closure Repeat until no more attributes can be added

Page 12: CS 405G: Introduction to Database Systems 16. Functional Dependency

04/19/23 Chen Qian @ Univ of Kentucky

A more complex example

WorkOn(EID, Ename, email, PID, Pname, Hours)

EID -> Ename, email email -> EID PID -> Pname EID, PID -> Hours

(Not a good design, and we will see why later)

Page 13: CS 405G: Introduction to Database Systems 16. Functional Dependency

04/19/23 Chen Qian @ Univ of Kentucky

Example of computing closure F includes:

EID -> Ename, email email -> EID PID -> Pname EID, PID -> Hours

{ PID, email }+ = ? Starting from: closure = { PID, email } email -> EID

Add EID; closure is now { PID, email, EID } EID -> Ename, email

Add Ename, email; closure is now { PID, email, EID, Ename } PID -> Pname

Add Pname; close is now { PID, Pname, email, EID, Ename } EID, PID -> hours

Add hours; closure is now all the attributes in WorksOn

Page 14: CS 405G: Introduction to Database Systems 16. Functional Dependency

04/19/23 Chen Qian @ Univ of Kentucky

Using attribute closure

Given a relation R and set of FDs F Does another FD X -> Y follow from F?

Compute X+ with respect to F If Y X+, then X -> Y follow from F

Is K a super key of R? Compute K+ with respect to F If K+ contains all the attributes of R, K is a super key

Is a super key K a key of R? Test where K’ = K – { a | a K} is a superkey of R for all

possible a

Page 15: CS 405G: Introduction to Database Systems 16. Functional Dependency

04/19/23 Chen Qian @ Univ of Kentucky

Rules of FDs

Armstrong’s axioms Reflexivity: If Y X, then X -> Y Augmentation: If X -> Y, then XZ -> YZ for any Z Transitivity: If X -> Y and Y -> Z, then X -> Z

Rules derived from axioms Splitting: If X -> YZ, then X -> Y and X -> Z Combining: If X -> Y and X -> Z, then X -> YZ

Page 16: CS 405G: Introduction to Database Systems 16. Functional Dependency

04/19/23 Chen Qian @ Univ of Kentucky

Using rules of FD’s

Given a relation R and set of FDs F Does another FD X -> Y follow from F?

Use the rules to come up with a proof Example:

F includes:EID -> Ename, email; email -> EID; EID, PID -> Hours, Pid -> Pname

PID, email ->hours?email -> EID (given in F)PID, email -> PID, EID (augmentation)PID, EID -> hours (given in F)PID, email -> hours (transitivity)

Page 17: CS 405G: Introduction to Database Systems 16. Functional Dependency

04/19/23 Chen Qian @ Univ of Kentucky

Example of redundancy

WorkOn (EID, Ename, email, PID, hour) We say X -> Y is a partial dependency if there exist a X’

X such that X’ -> Y e.g. EID, email-> Ename, email

Otherwise, X -> Y is a full dependency e.g. EID, PID -> hours

EID PID Ename email Pname Hours

1234 10 John Smith [email protected] B2B platform 10

1123 9 Ben Liu [email protected] CRM 40

1234 9 John Smith [email protected] CRM 30

1023 10 Susan Sidhuk [email protected] B2B platform 40

Page 18: CS 405G: Introduction to Database Systems 16. Functional Dependency

Database normalization relates to the level of redundancy in a relational database’s structure.

The key idea is to reduce the chance of having multiple different version of the same data.

Well-normalized databases have a schema that reflects the true dependencies between tracked quantities.

Any increase in normalization generally involves splitting existing tables into multiple ones, which must be re-joined each time a query is issued.

Database Normalization

04/19/23 Chen Qian @ University of Kentucky 19

Page 19: CS 405G: Introduction to Database Systems 16. Functional Dependency

04/19/23 Chen Qian @ University of Kentucky 20

Normalization

A normalization is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency.

A normal form is a certification that tells whether a relation schema is in a particular state

Page 20: CS 405G: Introduction to Database Systems 16. Functional Dependency

Normal Forms Edgar F. Codd originally established three normal

forms: 1NF, 2NF and 3NF.

3NF is widely considered to be sufficient.

Normalizing beyond 3NF can be tricky with current SQL technology as of 2005

Full normalization is considered a good exercise to help discover all potential internal database consistency problems.

04/19/23 Chen Qian @ University of Kentucky 21

Page 21: CS 405G: Introduction to Database Systems 16. Functional Dependency

First Normal Form ( 1NF )

NF is to characterize a relation (not an attribute, a key, etc…) We can only say “this relation or table is in 1NF”

A relation is in first normal form if the domain of each attribute contains only atomic values, and the value of each attribute contains only a single value from that domain.

04/19/23 Chen Qian @ University of Kentucky 22

Page 22: CS 405G: Introduction to Database Systems 16. Functional Dependency

04/19/23 Chen Qian @ Univ of Kentucky

Page 23: CS 405G: Introduction to Database Systems 16. Functional Dependency

04/19/23 Chen Qian @ University of Kentucky 24

2nd Normal Form An attribute A of a relation R is a nonprimary attribute

if it is not part of any key in R, otherwise, A is a primary attribute.

R is in (general) 2nd normal form if every nonprimary attribute A in R is not partially functionally dependent on any key of R

Page 24: CS 405G: Introduction to Database Systems 16. Functional Dependency

Redundancy Example

If a key will result a partial dependency of a nonprimary attribute.

e.g. EID, PID-> Ename In this case, the attribute (Ename) should be separated

with its full dependency key (EID) to be a new table.

So, to check whether a table includes redundancy. Try every nonprimary attribute and check whether it fully depends on any key.

04/19/23 Chen Qian @ University of Kentucky 25

Page 25: CS 405G: Introduction to Database Systems 16. Functional Dependency

04/19/23 Chen Qian @ Univ of Kentucky

Page 26: CS 405G: Introduction to Database Systems 16. Functional Dependency

Second normal Form ( 2NF )

2NF prescribes full functional dependency on the primary key.

It most commonly applies to tables that have composite primary keys, where two or more attributes comprise the primary key.

It requires that there are no non-trivial functional dependencies of a non-key attribute on a part (subset) of a candidate key.

A table is said to be in the 2NF if and only if it is in the 1NF and every non-key attribute is irreducibly dependent on the primary key

04/19/23 Chen Qian @ University of Kentucky 27

Page 27: CS 405G: Introduction to Database Systems 16. Functional Dependency

04/19/23 Chen Qian @ University of Kentucky 28

Decomposition

Decomposition eliminates redundancy To get back to the original relation, use natural join.

EID PID Ename email Pname Hours

1234 10 John Smith [email protected] B2B platform 10

1123 9 Ben Liu [email protected] CRM 40

1234 9 John Smith [email protected] CRM 30

1023 10 Susan Sidhuk [email protected] B2B platform 40

Decomposition

EID Ename email

1234 John Smith [email protected]

1123 Ben Liu [email protected]

1023 Susan Sidhuk [email protected]

EID PID Pname Hours

1234 10 B2B platform 10

1123 9 CRM 40

1234 9 CRM 30

1023 10 B2B platform 40

Foreign key

Page 28: CS 405G: Introduction to Database Systems 16. Functional Dependency

04/19/23 Chen Qian @ University of Kentucky 29

Decomposition

Decomposition may be applied recursively

EID PID Pname Hours

1234 10 B2B platform 10

1123 9 CRM 40

1234 9 CRM 30

1023 10 B2B platform 40

PID Pname

10 B2B platform

9 CRM

EID PID Hours

1234 10 10

1123 9 40

1234 9 30

1023 10 40

Page 29: CS 405G: Introduction to Database Systems 16. Functional Dependency

04/19/23 Chen Qian @ University of Kentucky 30

Unnecessary decomposition

Fine: join returns the original relation Unnecessary: no redundancy is removed, and now EID

is stored twice->

EID Ename email

1234 John Smith [email protected]

1123 Ben Liu [email protected]

1023 Susan Sidhuk [email protected]

EID Ename

1234 John Smith

1123 Ben Liu

1023 Susan Sidhuk

EID email

1234 [email protected]

1123 [email protected]

1023 [email protected]

Page 30: CS 405G: Introduction to Database Systems 16. Functional Dependency

04/19/23 Chen Qian @ University of Kentucky 31

Bad decomposition

Association between PID and hours is lost Join returns more rows than the original relation

EID PID Hours

1234 10 10

1123 9 40

1234 9 30

1023 10 40

EID PID

1234 10

1123 9

1234 9

1023 10

EID Hours

1234 10

1123 40

1234 30

1023 40

Page 31: CS 405G: Introduction to Database Systems 16. Functional Dependency

04/19/23 Chen Qian @ University of Kentucky 32

Lossless join decomposition

Decompose relation R into relations S and T attrs(R) = attrs(S) attrs(T) S = πattrs(S) ( R ) T = πattrs(T) ( R )

The decomposition is a lossless join decomposition if, given known constraints such as FD’s, we can guarantee that R = S T

Any decomposition gives R S T (why?) A lossy decomposition is one with R S T

Page 32: CS 405G: Introduction to Database Systems 16. Functional Dependency

04/19/23 Chen Qian @ University of Kentucky 33

Loss? But I got more rows->

“Loss” refers not to the loss of tuples, but to the loss of information Or, the ability to distinguish different original tuples

EID PID Hours

1234 10 10

1123 9 40

1234 9 30

1023 10 40

EID PID

1234 10

1123 9

1234 9

1023 10

EID Hours

1234 10

1123 40

1234 30

1023 40

Page 33: CS 405G: Introduction to Database Systems 16. Functional Dependency

04/19/23 Chen Qian @ University of Kentucky 34

Questions about decomposition

When to decompose

How to come up with a correct decomposition (i.e., lossless join decomposition)