55
Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

  • View
    228

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Lecture #3Functional Dependencies

NormalizationRelational Algebra

Thursday, October 12, 2000

Page 2: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Administration

• Homework #1 due today.

• Project descriptions & groups due today.

• Homework #2 available today.

• Exam date is looking like December 7th

– Complaints?

• Projects: tell us if you need to use the lab.

Page 3: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Functional DependenciesDefinition:

If two tuples agree on the attributes

A , A , … A 1 2 n

then they must also agree on the attributes

B , B , … B 1 2 m

Formally:

A , A , … A 1 2 n

B , B , … B 1 2 m

Motivating example for the study of functional dependencies:

Name Social Security Number Phone Number

Page 4: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Examples

• EmpID Name, Phone, Position

• Position Phone

• but Phone Position

EmpID Name Phone PositionE0045 Smith 1234 ClerkE1847 John 9876 SalesrepE1111 Smith 9876 SalesrepE9999 Mary 1234 lawyer

Page 5: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

In General

• To check A B, erase all other columns

• check if the remaining relation is many-one (called functional in mathematics)

… A … BX1 Y1X2 Y2… …

Page 6: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Example

EmpID Name Phone PositionE0045 Smith 1234 ClerkE1847 John 9876 SalesrepE1111 Smith 9876 SalesrepE9999 Mary 1234 lawyer

Page 7: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

More Examples

Product: name price, manufacturerPerson: ssn name, ageCompany: name stock price, president

Key of a relation is a set of attributes that:

- functionally determines all the attributes of the relation - none of its subsets determines all the attributes.

Superkey: a set of attributes that contains a key.

Page 8: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Finding the Keys of a Relation

Given a relation constructed from an E/R diagram, what is its key?

Rules:

1. If the relation comes from an entity set, the key of the relation is the set of attributes which is the key of the entity set.

address name ssn

Person

Page 9: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Rules for Binary Relationships

Several cases are possible for a binary relationship E1 - E2:

1. Many-many: the key includes the key of E1 together with the key of E2.

What happens for:

2. Many-one:

3. One-one:

PersonbuysProduct

name

price name ssn

Page 10: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Keys in Multiway Relationships

If there is an arrow from the relationship to E, then we don’t need the key of E as part of the relation key.

Purchase

Product

Person

Store

Payment Method

Page 11: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Rules in FD’s

A , A , … A 1 2 n B , B , … B 1 2 m

A , A , … A 1 2 n 1

Is equivalent to

B

A , A , … A 1 2 n 2B

A , A , … A 1 2 n mB

Splitting rule and Combing rule

Splitting/Combining Rule:

Page 12: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Rules in FD’s (continued)

A , A , … A 1 2 n iA Always holds.

Trivial Dependency

Why ?

Page 13: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Rules in FD’s (continued)

A , A , … A 1 2 n

Transitive Closure Rule:

B , B , … B1 2 m

A , A , … A 1 2 n

1B , B …, B

2 m

1C , C …, C

2 p

1C , C …, C

2 p

If

and

then

Why ?

Page 14: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Closure of a set of Attributes

Given a set of attributes {A1, …, An} and a set of dependencies S.Problem: find all attributes B such that:

any relation which satisfies S also satisfies:A1, …, An B

The closure of {A1, …, An}, denoted {A1, …, An} ,is the set of all such attributes B

+

Page 15: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Closure Algorithm

Start with X={A1, …, An}.

Repeat until X doesn’t change do:

if is in S, and

C is not in X

then

add C to X.

B , B , … B 1 2 nC

B , B , … B 1 2 n

are all in X, and

Page 16: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

ExampleA B CA D E B DA F B

Closure of {A,B}: X = {A, B, }

Closure of {A, F}: X = {A, F, }

Page 17: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Why Is the Algorithm Correct ?

• Show the following by induction:– For every B in X:

• A1, …, An B

• Initially X = {A1, …, An} -- holds• Induction step: B1, …, Bm in X

– Implies A1, …, An B1, …, Bm– We also have B1, …, Bm C– By transitivity we have A1, …, An C

• This shows that the algorithm is sound; need to show it is complete

Page 18: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Relational Schema Design

Main idea:

• Start with some relational schema

• Find out its FD’s

• Use them to design a better relational schema

Page 19: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Relational Schema Design

Name SSN Phone Number

Fred 123-321-99 (201) 555-1234

Fred 123-321-99 (206) 572-4312Joe 909-438-44 (908) 464-0028Joe 909-438-44 (212) 555-4000

Problems:

- redundancy - update anomalies - deletion anomalies

Recall set attributes (persons with several phones):

Note: SSN is NOT a key here

Page 20: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Relation Decomposition

SSN Name

123-321-99 Fred

909-438-44 Joe

SSN Phone Number

123-321-99 (201) 555-1234

123-321-99 (206) 572-4312909-438-44 (908) 464-0028909-438-44 (212) 555-4000

Break the relation into two:

Page 21: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Decompositions in GeneralA , A , … A 1 2 n

Let R be a relation with attributes

Create two relations R1 and R2 with attributes

B , B , … B 1 2 m C , C , … C 1 2 l

Such that:B , B , … B 1 2 m C , C , … C 1 2 l

A , A , … A 1 2 n

And -- R1 is the projection of R on

-- R2 is the projection of R on

B , B , … B 1 2 m

C , C , … C 1 2 l

Page 22: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Incorrect Decomposition

• Sometimes it is incorrect:Name Price Category

Gizmo 19.99 Gadget

OneClick 24.99 Camera

DoubleClick 29.99 Camera

Decompose on : Name, Category and Price, Category

Page 23: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Incorrect Decomposition

Name Category

Gizmo Gadget

OneClick Camera

DoubleClick Camera

Price Category

19.99 Gadget

24.99 Camera

29.99 Camera

Name Price Category

Gizmo 19.99 Gadget

OneClick 24.99 Camera

OneClick 29.99 Camera

DoubleClick 24.99 Camera

DoubleClick 29.99 Camera

When we put it back:

Cannot recover information

Page 24: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Boyce-Codd Normal Form

A simple condition for removing anomalies from relations:

A relation R is in BCNF if and only if:

Whenever there is a nontrivial dependency for R , it is the case that { } a super-key for R.

A , A , … A 1 2 n

BA , A , … A 1 2 n

In English (though a bit vague):

Whenever a set of attributes of R is determining another attribute, should determine all the attributes of R.

Page 25: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

BCNF Decomposition

Find a dependency that violates the BCNF condition:

A , A , … A 1 2 n B , B , … B 1 2 m

A’sOthers B’s

R1 R2

Heuristic: choose B , B , … B “as large as possible”

1 2 m

Decompose:

Find a 2-attribute relation that isnot in BCNF.

Continue untilthere are noBCNF violationsleft.

Page 26: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Example Decomposition

Name SSN Age EyeColor PhoneNumber

Functional dependencies: SSN Name, Age, Eye Color

What if we also had an attribute Draft-worthy, and the FD: Age Draft-worthy

Person:

BNCF: Person1(SSN, Name, Age, EyeColor), Person2(SSN, PhoneNumber)

Page 27: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Other Example

• R(A,B,C,D) A B, B C

• Key: A, D

• Violations of BCNF: A B, A C, A BC

• Pick A BC: split into R1(A,BC) R2(A,D)

• What happens if we pick A B first ?

Page 28: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Correct Decompositions A decomposition is lossless if we can recover: R(A,B,C)

R1(A,B) R2(A,C)

R’(A,B,C) = R(A,B,C)

R’ is in general larger than R. Must ensure R’ = R

Page 29: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Decomposition Based on BCNF is Necessarily Lossless

Attributes A, B, C. FD: A C

Relations R1(A,B) R2(A,C)

Tuple in R: (a,b,c)Tuples in R1: (a,b), (a,b’)Tuples in R2: (a,c), (a,c’)

Tuples in the join of R1 and R2: (a,b,c), (a,b,c’), (a,b’,c), (a,b’,c’)

Can (a,b,c’) be a bogus tuple? What about (a,b’,c’) ?

Page 30: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

ExampleName SSN Phone Number

Fred 123-321-99 (201) 555-1234

Fred 123-321-99 (206) 572-4312Joe 909-438-44 (908) 464-0028Joe 909-438-44 (212) 555-4000

What are the dependencies?

What are the keys?

Is it in BCNF?

Page 31: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

And Now?

SSN Name

123-321-99 Fred

909-438-44 Joe

SSN Phone Number

123-321-99 (201) 555-1234

123-321-99 (206) 572-4312909-438-44 (908) 464-0028909-438-44 (212) 555-4000

Page 32: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

3NF: A Problem with BCNFUnit Company Product

Unit Company

Unit Product

FD’s: Unit -> Company; Company, Product -> UnitSo, there is a BCNF violation, and we decompose.

Unit Company

No FDs

Page 33: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

So What’s the Problem?

Unit Company Product

Unit Company Unit Product

Galaga99 UW Galaga99 databasesBingo UW Bingo databases

No problem so far. All local FD’s are satisfied.Let’s put all the data back into a single table again:

Galaga99 UW databasesBingo UW databases

Violates the dependency: company, product -> unit!

Page 34: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Solution: 3rd Normal Form (3NF)

A simple condition for removing anomalies from relations:

A relation R is in 3rd normal form if and only if:

Whenever there is a nontrivial dependency for R , it is the case that { } a super-key for R, or B is part of a key.

A , A , … A 1 2 n

BA , A , … A 1 2 n

What happened to first and second normal forms?

Will we have more normal forms?

Page 35: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Multi-valued Dependencies

SSN Phone Number Course

123-321-99 (206) 572-4312 CSE-444 123-321-99 (206) 572-4312 CSE-341123-321-99 (206) 432-8954 CSE-444123-321-99 (206) 432-8954 CSE-341

The multi-valued dependencies are:

SSN Phone Number SSN Course

Page 36: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Definition of Multi-valued Dependecy

Given R(A1,…,An,B1,…,Bm,C1,…,Cp)

the MVD A1,…,An B1,…,Bm holds if:

for any values of A1,…,An the “set of values” of B1,…,Bm is “independent” of those of C1,…Cp

Page 37: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Definition of MVDs Continued

Equivalently: the decomposition into

R1(A1,…,An,B1,…,Bm), R2(A1,…,An,C1,…,Cp)

is lossless

Note: an MVD A1,…,An B1,…,Bm

Implicitly talks about “the other” attributes C1,…Cp

Page 38: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Rules for MVDs

If A1,…An B1,…,Bm

then A1,…,An B1,…,Bm

Other rules in the book

Page 39: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

4th Normal Form (4NF)

R is in 4NF if whenever:

A1,…,An B1,…,Bm

is a nontrivial MVD, then A1,…,An is a superkey

Same as BCNF with FDs replaced by MVDs

Page 40: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Confused by Normal Forms ?

3NF

BCNF

4NF

In practice: (1) 3NF is enough, (2) don’t overdo it !

Page 41: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Querying the Database• How do we specify what we want from our

database? Find all the employees who earn more than

$50,000 and pay taxes in New Jersey.• We design high-level query languages:

– SQL (used everywhere)– Datalog (used by database theoreticians, their students,

friends and family)

• Relational algebra: a basic set of operations on relations that provide the basic principles.

Page 42: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Relational Algebra at a Glance• Operators: relations as input, new relation as output • Five basic RA operators:

– Basic Set Operators• union, difference (no intersection, no complement)

– Selection:– Projection: – Cartesian Product: X

• Derived operators:– Intersection, complement– Joins (natural,equi-join, theta join, semi-join)

• When our relations have attribute names:– Renaming:

Page 43: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Set Operations

• Binary operations• Union: all tuples in R1 or R2

– R1 U R2– Example:

• ActiveEmployees U RetiredEmployees

• Difference: all tuples in R1 and not in R2– R1 – R2– Example

• AllEmployees - RetiredEmployees

Page 44: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Selection

• Unary operation: returns a subset of the tuples which satisfy some condition

• Notation: (R)

• c is a condition:– =, <, >, and, or, not

• Find all employees with salary more than $40,000:– (Employee)

c

Salary > 40000

Page 45: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Selection Example

EmployeeSSN Name DepartmentID Salary999999999 John 1 30,000777777777 Tony 1 32,000888888888 Alice 2 45,000

SSN Name DepartmentID Salary888888888 Alice 2 45,000

Find all employees with salary more than $40,000.

Page 46: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Projection

• Unary operation: returns certain columns

• Eliminates duplicate tuples !

• Notation: (R)

• Example: project social-security number and names:– (Employee)

A1,…,An

SSN, Name

Page 47: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Projection Example

EmployeeSSN Name DepartmentID Salary999999999 John 1 30,000777777777 Tony 1 32,000888888888 Alice 2 45,000

SSN Name999999999 John777777777 Tony888888888 Alice

Page 48: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Cartesian Product

• Binary Operation• Result is tuples combining any element of

R1 with any element of R2, for R1 X R2• Schema is union of Schema(R1) &

Schema(R2)• Notation: R1 x R2• Example: Employee x Dependents• Very rare in practice; but joins are very

common.

Page 49: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Cartesion Product Example

EmployeeName SSNJohn 999999999Tony 777777777

DependentsEmployeeSSN Dname999999999 Emily777777777 Joe

Employee_DependentsName SSN EmployeeSSN DnameJohn 999999999 999999999 EmilyJohn 999999999 777777777 JoeTony 777777777 999999999 EmilyTony 777777777 777777777 Joe

Page 50: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Join (Natural)• Most important, expensive and exciting.

• Combines two relations, selecting only related tuples

• Equivalent to a cross product followed by selection

• Resulting schema has all attributes of the two relations, but one copy of join condition attributes

Page 51: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Join Example

EmployeeName SSNJohn 999999999Tony 777777777

DependentsEmployeeSSN Dname999999999 Emily777777777 Joe

Employee_DependentsName SSN DnameJohn 999999999 EmilyTony 777777777 Joe

Page 52: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Other Joins and Renaming

• Theta join: the join involves a predicate– R S

• Semi-join: the attributes of one relation are included in the other.

• Renaming:

Page 53: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Complex Queries

Product ( pname, price, category, maker)Purchase (buyer, seller, store, product)Company (cname, stock price, country)Person( per-name, phone number, city)

Find phone numbers of people who bought gizmos from Fred.

Find telephony products that somebody bought

Page 54: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Exercises

Product ( pname, price, category, maker)Purchase (buyer, seller, store, product)Company (cname, stock price, country)Person( per-name, phone number, city)

Ex #1: Find people who bought telephony products.Ex #2: Find names of people who bought American productsEx #3: Find names of people who bought American products and did not buy French productsEx #4: Find names of people who bought American products and they live in Seattle.Ex #5: Find people who bought stuff from Joe or bought products from a company whose stock prices is more than $50.

Page 55: Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Operations on Bags (and why we care)

Basic operations:

Projection Selection Union Intersection Set difference Cartesian product

Join (natural join, theta join)