28
11/07/2003 Akbar Mokhtarani (LBNL) 1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003

11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003

Embed Size (px)

Citation preview

Page 1: 11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003

11/07/2003 Akbar Mokhtarani (LBNL) 1

Normalizationof Relational Tables

Akbar Mokhtarani

LBNL (HENPC group)

November 7, 2003

Page 2: 11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003

11/07/2003 Akbar Mokhtarani (LBNL) 2

Overview Relational Model Basics Functional Dependencies Modification Anomalies Normalization

Page 3: 11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003

11/07/2003 Akbar Mokhtarani (LBNL) 3

Relational Model Introduced by E. F. Codd in 1970 It consists of:

Data structure: in the form of tables Data manipulation: operation used to

manipulate data in the relations (e.g SQL) Data integrity: facilities to maintain the

integrity of data when they are manipulated

Page 4: 11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003

11/07/2003 Akbar Mokhtarani (LBNL) 4

Data structure (tables) Each table consists of a set of named

columns (attributes corresponding to some real-world entity)

Each row corresponds to a record containing data values for a single entity

attributes are single-valued and have domains (set of values)

Page 5: 11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003

11/07/2003 Akbar Mokhtarani (LBNL) 5

Properties of Relations(Not all tables are relations)

A table is called a relation if: The table has a unique name Values are atomic (no repeating group) Each row is uniquely determined by a key Each attribute has a unique name The order of columns is insignificant The order of rows is insignificant

Page 6: 11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003

11/07/2003 Akbar Mokhtarani (LBNL) 6

Keys A super key is a group of one or more

attributes that uniquely identifies a row Candidate keys: irreducible super keys Primary key: candidate key selected to

identify the row Alternate key: candidate key other than the

primary key Foreign key:a set of attributes of one relation

whose values match values of some candidate key of another relation

Page 7: 11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003

11/07/2003 Akbar Mokhtarani (LBNL) 7

Example

Custumer_ID

Custumer_name

Address

City State Zip

Order_ID

Order_Date

Customer_ID

Order_ID

Product_ID

Quantity

Product_ID

Product_Decription

Product_Finfish

Prod_Price

On_Hand

CUSTOMER

ORDER

ORDERLINE

PRODUCT

FK

Page 8: 11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003

11/07/2003 Akbar Mokhtarani (LBNL) 8

Integrity Constraints Major integrity constraints (business rules):

Domain constraints Values in a column have the same domain (data type and size)

Entity integrity Non-null primary key

Referential integrity If there is a foreign key, each FK must either match the primary

key value in another relation or the FK must be null Action assertions

Action constraints (e.g no student can take more than 15 units per term)

Page 9: 11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003

11/07/2003 Akbar Mokhtarani (LBNL) 9

Functional Dependency (FD)(Relationship Among Attributes)

A Functional Dependency is a special integrity constraint that states:

FD: X Y means if t1.X = t2.X then t1.Y = t2.YWhere:

X and Y are subsets of attributes of a relation R

t1 and t2 are tuples of any relational instance of R

X is said to functionally determine Y, or Y is functionally dependent on X

X is called determinant

Page 10: 11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003

11/07/2003 Akbar Mokhtarani (LBNL) 10

Functional Dependency (Cont’d) Full FD: FD X Y is a full FD if removal

of any attribute from X destroys the dependency

Partial FD: FD XY is partial if one or more non-key attributes are determined by a subset of X

Page 11: 11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003

11/07/2003 Akbar Mokhtarani (LBNL) 11

FD Rules1. Reflexive If X Y, then Y X

2. Augmentation: If X Y, then XZ YZ

3. Transitive: If XY and Y Z, then Xz

4. Decomposition: If XYZ, then X Y and XZ

5. Union: If XY and XZ, then XYZ

6. Pseudo transitive: If XY and WYZ, then WXZ

Page 12: 11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003

11/07/2003 Akbar Mokhtarani (LBNL) 12

FD ExampleA B C D

a1 b1 c1 d1

a1 b2 c2 d1

a2 b1 c1 d2

a1 b1 c1 d2

ABC, but AB D

Page 13: 11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003

11/07/2003 Akbar Mokhtarani (LBNL) 13

Modification AnomaliesAnomalies are unexpected side effects that occurs when modifying the contents of a table

with excessive redundancies Insertion anomaly: Need to add extra data in

order to add the desired data to DB Deletion anomaly: Deleting a row causes

other data to be deleted Update anomaly: Need to change multiple

rows to modify a single fact

Page 14: 11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003

11/07/2003 Akbar Mokhtarani (LBNL) 14

Normalization Normalization is the process of

decomposing relations with anomalies to produce smaller, well structured relations

It is built around the concept of Normal Forms A relation is said to be in a particular normal

form if it satisfies certain conditions

Page 15: 11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003

11/07/2003 Akbar Mokhtarani (LBNL) 15

Levels of Normalization

a

1NF

2NF

BCNF

3NF

4NF

5NF

Domain/KeyNF

Page 16: 11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003

11/07/2003 Akbar Mokhtarani (LBNL) 16

A relation is in 1NF if it contains no multivalued attributes 2NF if it is in 1NF and every non-key attribute

is fully functionally dependent on the PK 3NF if it is in 2NF and no transitive

dependencies exit BCNF if every determinant is a candidate key

Page 17: 11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003

11/07/2003 Akbar Mokhtarani (LBNL) 17

Steps in Normalization

First normal form

Second normal form

Third formal form

Boyce-Codd normal form

Table with multivalued attributes

Remove Multivalued attributes

Remove partial dependencies

Remove remaining Anomalies resulting

From FD

Remove transitive dependencies

Page 18: 11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003

11/07/2003 Akbar Mokhtarani (LBNL) 18

First Normal FormStudentID StudentName Major ClassName Description Grade SSN

1002 Mary Smith Accounting Math 102 Algebra I A 112-23-3214

1005 John Doe Physics Math 105 Calculus I B 213-66-3456

1007 Alice Walker Chemistry Math 105 Calculus I A 342-43-5690

1002 Mary Smith Accounting Stat 120 Statistics I C 112-23-3214

1007 Alice Walker Chemistry Chem 210 Org. Chem B 342-43-5690

1003 Dave Smith Electronics Math 105 Calculus I A 465-34-5465

1003 Dave Smith Electronics Chem 210 Org. Chem C 465-34-5465

1010 Dian Hall Physics Lit 100 English Lit I B 753-23-0958

1012 Lisa Gilmore Sociology Soc 101 Intro. Soc A 564-34-9078

This relation contains :

•Insertion anomaly: adding new department or class require a student to sign up for it

•Deletion anomaly: deleting “Lisa Gilmore” causes information about “Sociology” department and “Soc 101” class

•Update anomaly: if course description for “Math 105” changes, many rows need to be updated

Page 19: 11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003

11/07/2003 Akbar Mokhtarani (LBNL) 19

FD diagram

SID StName Major ClassName Desc. Grade SSN

Page 20: 11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003

11/07/2003 Akbar Mokhtarani (LBNL) 20

StudentID StudentName SSN Major1002 Mary Smith 112-23-3214 Accounting1005 John Doe 213-66-3456 Physics1007 Alice Walker 342-43-5690 Chemistry1003 Dave Smith 465-34-5465 Electronics1010 Dian Hall 753-23-0958 Physics1012 Lisa Gilmore 564-34-9078 Sociology

ClassName DescriptionMath 102 Algebra IMath 105 Calculus IStat 120 Statistics I

Chem 210 Org. ChemLit 100 English Lit I

Soc 101 Intro. Soc

StudentID ClassName Grade1002 Math 102 A1005 Math 105 B1007 Math 105 A1002 Stat 120 C1007 Chem 210 B1003 Math 105 A1003 Chem 210 C1010 Lit 100 B1012 Soc 101 A

2NF and 3NF Normal Form

We still have insertion and deletion anomalies for the “Major”

Page 21: 11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003

11/07/2003 Akbar Mokhtarani (LBNL) 21

StudentID StudentName SSN Major1002 Mary Smith 112-23-3214 Accounting1005 John Doe 213-66-3456 Physics1007 Alice Walker 342-43-5690 Chemistry1003 Dave Smith 465-34-5465 Electronics1010 Dian Hall 753-23-0958 Physics1012 Lisa Gilmore 564-34-9078 Sociology

ClassName Description Dept_ID Dept_ID DepartmentMath 102 Algebra I 1003 1000 AccountingMath 105 Calculus I 1003 1001 ChemistryStat 120 Statistics I 1003 1002 Electronics

Chem 210 Org. Chem 1001 1003 MathematicsLit 100 English Lit I 1007 1004 Music

Soc 101 Intro. Soc 1006 1005 Physics1006 Sociology1007 English

StudentID ClassName Grade1002 Math 102 A1005 Math 105 B1007 Math 105 A1002 Stat 120 C1007 Chem 210 B1003 Math 105 A1003 Chem 210 C1010 Lit 100 B1012 Soc 101 A

Anomaly free Form

Page 22: 11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003

11/07/2003 Akbar Mokhtarani (LBNL) 22

Another exampleA B C D E G

1NF

B C D E G C A

2NF

B C D E D G C A

3NF

B C D E

C D E D G C A

E B

BCNF

Switch keys

Page 23: 11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003

11/07/2003 Akbar Mokhtarani (LBNL) 23

Relational Algebra The manipulative part of relational

model is called relational algebra. It is a collection of operators that take

relations as their operand and return a relation as their result.

Two groups of operators: Set operators: union, intersection,

difference, and cartesian product Relational operators: restrict (select),

project, join, and divide.

Page 24: 11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003

11/07/2003 Akbar Mokhtarani (LBNL) 24

Set Operators Restrict: Returns a relation

containing all tuples from a special relation that satisfy a specified condition.

Project: Returns a relation containing all (sub)tuples that remain in a specified relation after specified attributes have been removed

Page 25: 11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003

11/07/2003 Akbar Mokhtarani (LBNL) 25

Set Operators (cont’d) Product: Returns a relation

containing all possible tuples that are a combination of two tuples, one from each two specified relations.

Union: Returns a relation containing all tuples that appear in either or both of two specified relations.

Page 26: 11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003

11/07/2003 Akbar Mokhtarani (LBNL) 26

Relational Operators Intersect: Returns a relation

containing all tuples that appear in both of two specified relations.

Difference: Returns a relation containing all tuples that appear in the first and not in the second of two specified relations.

Page 27: 11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003

11/07/2003 Akbar Mokhtarani (LBNL) 27

Relational operators (cont’d) Join:Returns a relation containing all

possible tuples that are a combination of two tuples, one from each of two specified relations, such that the two tuples contributing to any given combination have a common value for the common attributes of the two relations

Page 28: 11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003

11/07/2003 Akbar Mokhtarani (LBNL) 28

Relational Operators (cont’d) Divide: Takes two unary relations

and one binary relation and returns a relation containing all tuples from one unary relation that appear in the binary relation matched with all tuples in the other unary relation