Upload
madeleine-fields
View
217
Download
3
Tags:
Embed Size (px)
Citation preview
11/07/2003 Akbar Mokhtarani (LBNL) 1
Normalizationof Relational Tables
Akbar Mokhtarani
LBNL (HENPC group)
November 7, 2003
11/07/2003 Akbar Mokhtarani (LBNL) 2
Overview Relational Model Basics Functional Dependencies Modification Anomalies Normalization
11/07/2003 Akbar Mokhtarani (LBNL) 3
Relational Model Introduced by E. F. Codd in 1970 It consists of:
Data structure: in the form of tables Data manipulation: operation used to
manipulate data in the relations (e.g SQL) Data integrity: facilities to maintain the
integrity of data when they are manipulated
11/07/2003 Akbar Mokhtarani (LBNL) 4
Data structure (tables) Each table consists of a set of named
columns (attributes corresponding to some real-world entity)
Each row corresponds to a record containing data values for a single entity
attributes are single-valued and have domains (set of values)
11/07/2003 Akbar Mokhtarani (LBNL) 5
Properties of Relations(Not all tables are relations)
A table is called a relation if: The table has a unique name Values are atomic (no repeating group) Each row is uniquely determined by a key Each attribute has a unique name The order of columns is insignificant The order of rows is insignificant
11/07/2003 Akbar Mokhtarani (LBNL) 6
Keys A super key is a group of one or more
attributes that uniquely identifies a row Candidate keys: irreducible super keys Primary key: candidate key selected to
identify the row Alternate key: candidate key other than the
primary key Foreign key:a set of attributes of one relation
whose values match values of some candidate key of another relation
11/07/2003 Akbar Mokhtarani (LBNL) 7
Example
Custumer_ID
Custumer_name
Address
City State Zip
Order_ID
Order_Date
Customer_ID
Order_ID
Product_ID
Quantity
Product_ID
Product_Decription
Product_Finfish
Prod_Price
On_Hand
CUSTOMER
ORDER
ORDERLINE
PRODUCT
FK
11/07/2003 Akbar Mokhtarani (LBNL) 8
Integrity Constraints Major integrity constraints (business rules):
Domain constraints Values in a column have the same domain (data type and size)
Entity integrity Non-null primary key
Referential integrity If there is a foreign key, each FK must either match the primary
key value in another relation or the FK must be null Action assertions
Action constraints (e.g no student can take more than 15 units per term)
11/07/2003 Akbar Mokhtarani (LBNL) 9
Functional Dependency (FD)(Relationship Among Attributes)
A Functional Dependency is a special integrity constraint that states:
FD: X Y means if t1.X = t2.X then t1.Y = t2.YWhere:
X and Y are subsets of attributes of a relation R
t1 and t2 are tuples of any relational instance of R
X is said to functionally determine Y, or Y is functionally dependent on X
X is called determinant
11/07/2003 Akbar Mokhtarani (LBNL) 10
Functional Dependency (Cont’d) Full FD: FD X Y is a full FD if removal
of any attribute from X destroys the dependency
Partial FD: FD XY is partial if one or more non-key attributes are determined by a subset of X
11/07/2003 Akbar Mokhtarani (LBNL) 11
FD Rules1. Reflexive If X Y, then Y X
2. Augmentation: If X Y, then XZ YZ
3. Transitive: If XY and Y Z, then Xz
4. Decomposition: If XYZ, then X Y and XZ
5. Union: If XY and XZ, then XYZ
6. Pseudo transitive: If XY and WYZ, then WXZ
11/07/2003 Akbar Mokhtarani (LBNL) 12
FD ExampleA B C D
a1 b1 c1 d1
a1 b2 c2 d1
a2 b1 c1 d2
a1 b1 c1 d2
ABC, but AB D
11/07/2003 Akbar Mokhtarani (LBNL) 13
Modification AnomaliesAnomalies are unexpected side effects that occurs when modifying the contents of a table
with excessive redundancies Insertion anomaly: Need to add extra data in
order to add the desired data to DB Deletion anomaly: Deleting a row causes
other data to be deleted Update anomaly: Need to change multiple
rows to modify a single fact
11/07/2003 Akbar Mokhtarani (LBNL) 14
Normalization Normalization is the process of
decomposing relations with anomalies to produce smaller, well structured relations
It is built around the concept of Normal Forms A relation is said to be in a particular normal
form if it satisfies certain conditions
11/07/2003 Akbar Mokhtarani (LBNL) 15
Levels of Normalization
a
1NF
2NF
BCNF
3NF
4NF
5NF
Domain/KeyNF
11/07/2003 Akbar Mokhtarani (LBNL) 16
A relation is in 1NF if it contains no multivalued attributes 2NF if it is in 1NF and every non-key attribute
is fully functionally dependent on the PK 3NF if it is in 2NF and no transitive
dependencies exit BCNF if every determinant is a candidate key
11/07/2003 Akbar Mokhtarani (LBNL) 17
Steps in Normalization
First normal form
Second normal form
Third formal form
Boyce-Codd normal form
Table with multivalued attributes
Remove Multivalued attributes
Remove partial dependencies
Remove remaining Anomalies resulting
From FD
Remove transitive dependencies
11/07/2003 Akbar Mokhtarani (LBNL) 18
First Normal FormStudentID StudentName Major ClassName Description Grade SSN
1002 Mary Smith Accounting Math 102 Algebra I A 112-23-3214
1005 John Doe Physics Math 105 Calculus I B 213-66-3456
1007 Alice Walker Chemistry Math 105 Calculus I A 342-43-5690
1002 Mary Smith Accounting Stat 120 Statistics I C 112-23-3214
1007 Alice Walker Chemistry Chem 210 Org. Chem B 342-43-5690
1003 Dave Smith Electronics Math 105 Calculus I A 465-34-5465
1003 Dave Smith Electronics Chem 210 Org. Chem C 465-34-5465
1010 Dian Hall Physics Lit 100 English Lit I B 753-23-0958
1012 Lisa Gilmore Sociology Soc 101 Intro. Soc A 564-34-9078
This relation contains :
•Insertion anomaly: adding new department or class require a student to sign up for it
•Deletion anomaly: deleting “Lisa Gilmore” causes information about “Sociology” department and “Soc 101” class
•Update anomaly: if course description for “Math 105” changes, many rows need to be updated
11/07/2003 Akbar Mokhtarani (LBNL) 19
FD diagram
SID StName Major ClassName Desc. Grade SSN
11/07/2003 Akbar Mokhtarani (LBNL) 20
StudentID StudentName SSN Major1002 Mary Smith 112-23-3214 Accounting1005 John Doe 213-66-3456 Physics1007 Alice Walker 342-43-5690 Chemistry1003 Dave Smith 465-34-5465 Electronics1010 Dian Hall 753-23-0958 Physics1012 Lisa Gilmore 564-34-9078 Sociology
ClassName DescriptionMath 102 Algebra IMath 105 Calculus IStat 120 Statistics I
Chem 210 Org. ChemLit 100 English Lit I
Soc 101 Intro. Soc
StudentID ClassName Grade1002 Math 102 A1005 Math 105 B1007 Math 105 A1002 Stat 120 C1007 Chem 210 B1003 Math 105 A1003 Chem 210 C1010 Lit 100 B1012 Soc 101 A
2NF and 3NF Normal Form
We still have insertion and deletion anomalies for the “Major”
11/07/2003 Akbar Mokhtarani (LBNL) 21
StudentID StudentName SSN Major1002 Mary Smith 112-23-3214 Accounting1005 John Doe 213-66-3456 Physics1007 Alice Walker 342-43-5690 Chemistry1003 Dave Smith 465-34-5465 Electronics1010 Dian Hall 753-23-0958 Physics1012 Lisa Gilmore 564-34-9078 Sociology
ClassName Description Dept_ID Dept_ID DepartmentMath 102 Algebra I 1003 1000 AccountingMath 105 Calculus I 1003 1001 ChemistryStat 120 Statistics I 1003 1002 Electronics
Chem 210 Org. Chem 1001 1003 MathematicsLit 100 English Lit I 1007 1004 Music
Soc 101 Intro. Soc 1006 1005 Physics1006 Sociology1007 English
StudentID ClassName Grade1002 Math 102 A1005 Math 105 B1007 Math 105 A1002 Stat 120 C1007 Chem 210 B1003 Math 105 A1003 Chem 210 C1010 Lit 100 B1012 Soc 101 A
Anomaly free Form
11/07/2003 Akbar Mokhtarani (LBNL) 22
Another exampleA B C D E G
1NF
B C D E G C A
2NF
B C D E D G C A
3NF
B C D E
C D E D G C A
E B
BCNF
Switch keys
11/07/2003 Akbar Mokhtarani (LBNL) 23
Relational Algebra The manipulative part of relational
model is called relational algebra. It is a collection of operators that take
relations as their operand and return a relation as their result.
Two groups of operators: Set operators: union, intersection,
difference, and cartesian product Relational operators: restrict (select),
project, join, and divide.
11/07/2003 Akbar Mokhtarani (LBNL) 24
Set Operators Restrict: Returns a relation
containing all tuples from a special relation that satisfy a specified condition.
Project: Returns a relation containing all (sub)tuples that remain in a specified relation after specified attributes have been removed
11/07/2003 Akbar Mokhtarani (LBNL) 25
Set Operators (cont’d) Product: Returns a relation
containing all possible tuples that are a combination of two tuples, one from each two specified relations.
Union: Returns a relation containing all tuples that appear in either or both of two specified relations.
11/07/2003 Akbar Mokhtarani (LBNL) 26
Relational Operators Intersect: Returns a relation
containing all tuples that appear in both of two specified relations.
Difference: Returns a relation containing all tuples that appear in the first and not in the second of two specified relations.
11/07/2003 Akbar Mokhtarani (LBNL) 27
Relational operators (cont’d) Join:Returns a relation containing all
possible tuples that are a combination of two tuples, one from each of two specified relations, such that the two tuples contributing to any given combination have a common value for the common attributes of the two relations
11/07/2003 Akbar Mokhtarani (LBNL) 28
Relational Operators (cont’d) Divide: Takes two unary relations
and one binary relation and returns a relation containing all tuples from one unary relation that appear in the binary relation matched with all tuples in the other unary relation