21
Schema Refinement and Normal Forms 2013 1 CS3754 Class Notes #7, John Shieh

Schema Refinement and Normal Forms

Embed Size (px)

DESCRIPTION

Schema Refinement and Normal Forms. Normalization. It is a process that we can use to remove design flaws from a database A number of normal forms, which are sets of rules describing what we should and should not do in our table structure - PowerPoint PPT Presentation

Citation preview

Page 1: Schema Refinement and Normal Forms

Schema Refinement and Normal Forms

2013 1CS3754 Class Notes #7, John Shieh

Page 2: Schema Refinement and Normal Forms

2 CS4753/2006F

Normalization

• It is a process that we can use to remove design flaws from a database

• A number of normal forms, which are sets of rules describing what we should and should not do in our table structure

• 3NF is sufficient to avoid the data redundancy problem of a designed relational database

Page 3: Schema Refinement and Normal Forms

Problems caused by redundancy

• Redundant Storage– Some information is stored repeatedly.

• Update Anomalies– If one copy of such repeated data is updated, an

inconsistency is created, unless all copies are similarly updated.

• Insertion anomalies– It may not be possible to store certain information

unless some other, unrelated, information is stored.• Deletion Anomalies

– It may not be possible to delete certain information without losing some other, unrelated, information.

2013 3CS3754 Class Notes #7, John Shieh

Page 4: Schema Refinement and Normal Forms

• Redundant Storage– The hourly wages depend on rating levels. So, for

example, hourly wage 10 for rating level 8 is repeated three times.

• Update Anomalies– The hourly_wages in the first tuple could be updated

without making a similar change in the second tuple.

Id name lot rating Hourly_wages Hours_worked

123-22-3666 Attishoo 48 8 10 40

231-31-5368 Smiley 22 8 10 30

131-24-3650 Smethurst 35 5 7 30

434-26-3751 Guldu 35 5 7 32

612-67-4134 Madayan 35 8 10 40

2013 4CS3754 Class Notes #7, John Shieh

Page 5: Schema Refinement and Normal Forms

• Insertion Anomalies– We cannot insert a tuple for an employee unless we

know the hourly wage for the employee’s rating value.

• Deletion Anomalies– If we delete all tuples with a given rating value (e.g.

tuples of Smethurst and Guldu) we lose the association between the rating value and its hourly_wage value.

Id name lot rating Hourly_wages Hours_worked

123-22-3666 Attishoo 48 8 10 40

231-31-5368 Smiley 22 8 10 30

131-24-3650 Smethurst 35 5 7 30

434-26-3751 Guldu 35 5 7 32

612-67-4134 Madayan 35 8 10 40

2013 5CS3754 Class Notes #7, John Shieh

Page 6: Schema Refinement and Normal Forms

Decompositions

• Intuitively, redundancy arise when a relational schema forces an association between attributes that is not natural.

• Functional dependencies can be used to identify such situations and suggest refinements to the schema.

• The essential idea is that many problems arising from redundancy can be addressed by replacing a relation with a collection of ‘smaller’ relation.

2013 6CS3754 Class Notes #7, John Shieh

Page 7: Schema Refinement and Normal Forms

Id name lot rating Hourly_wages Hours_worked

123-22-3666 Attishoo 48 8 10 40

231-31-5368 Smiley 22 8 10 30

131-24-3650 Smethurst 35 5 7 30

434-26-3751 Guldu 35 5 7 32

612-67-4134 Madayan 35 8 10 40

Id name lot rating Hours_worked

123-22-3666 Attishoo 48 8 40

231-31-5368 Smiley 22 8 30

131-24-3650 Smethurst 35 5 30

434-26-3751 Guldu 35 5 32

612-67-4134 Madayan 35 8 40

rating Hourly_wages

8 10

5 7

A decomposition of a relation schema R consists of replacingthe relation schema by two (or more) relation schemas each of which contains a subset of attributes of R and together include allattributes in R

Functional dependency: - rating determines Hourly_wages

2013 7CS3754 Class Notes #7, John Shieh

Page 8: Schema Refinement and Normal Forms

Functional Dependencies• A functional dependency (FD) is a kind of IC that generalizes the

concept of a key.• Let R be a relation schema, and X and Y be sets of nonempty sets

of attributes in R. – An FD X Y exists, if in every relation instance for R, any two tuples that

agree on the value of X also agree on the value of Y.– More formally

• Let R be a relation schema and let X and Y be nonempty sets of attributes in R. An FD X Y exists in R if every instance of R preserves the FD X Y.

• We say that an instance r of R preserves the FD X Y if the following holds for every pair of tuples t1 and t2 in r

If t1.X = t2.X, then t1.Y = t2.Y

The notation t1.X refers to the subset of fields of tuple t1 for the attributes in X2013 8CS3754 Class Notes #7, John Shieh

Page 9: Schema Refinement and Normal Forms

student_ID student_name course_ID course_name

111 Chan Tai Man 3170 Database222 Wong Siu Ling 3170 Database333 Tam Wai Ming 3160 Algorithms111 Chan Tai Man 3160 Algorithms

Examples:

course_ID course_name is preserved?

{student_ID, course_ID} course_name is preserved ?

if no two rows agree on value, then is trivially preserved.

yes

yes

Take

2013 9CS3754 Class Notes #7, John Shieh

Page 10: Schema Refinement and Normal Forms

student_ID student_name course_ID course_name

111 Chan Tai Man 3170 Database222 Wong Siu Ling 3170 Database333 Tam Wai Ming 3160 Algorithms111 Chan Tai Man 3160 Algorithms

The table instance also preserves the following

student_ID student_name

Student_ID, course_ID {student_name, course_name}

student_ID, course_ID {student_ID, student_name, course_ID, course_Name}

student_name student_name (a trivial dependency)

student_name, course_name student_name (also trivial)

many more ….

2013 10CS3754 Class Notes #7, John Shieh

Page 11: Schema Refinement and Normal Forms

How do we know if a FD exist in R?• Can we check all instances of R to see if the FD is preserved?

– Definitely, not possible!– Whether or not a functional dependency exists must be determined by

assumptions given in advance, or common sense, not by individual relation instances.

• Given an instance r of R, we can check if r preserves some

functional dependency f, but we cannot tell if f holds over R.

course_ID student_name ?

Although it is preserved by this table, it does not fit the assumption.

student_ID student_name course_ID course_name

111 Chan Tai Man 3170 Database222 Wong Siu Ling 2150 Graph Theory333 Tam Wai Ming 3160 Algorithms111 Chan Tai Man 3000 Compiler

no

2013 11CS3754 Class Notes #7, John Shieh

Page 12: Schema Refinement and Normal Forms

• The assumptions given in advance, or common sense, impose some constraints, and are called the semantics of a database

• Assumptions given in advance impose explicit constraints; common sense imposes implicit constraints

2013 12CS3754 Class Notes #7, John Shieh

Page 13: Schema Refinement and Normal Forms

Example:• Application is to keep track of information about

employees in a company.• Information to be kept track of includes:

eid: employee’s id number

ename: employee name

address: address of the employee

sex: employee’s sex

dname: name of the department that the employee works for

dhname: department head’s name

dhsex: department head’s sex

2013 13CS3754 Class Notes #7, John Shieh

Page 14: Schema Refinement and Normal Forms

Let’s construct a relation schema as follows:

Which of the following dependencies are true?1. eid ename2. ename eid3. eid address4. eid sex5. sex address6. dhname dname7. dhname eid8. dhsex sex

Assumptions:

a:Employee’s id number is unique

b:Each employee has a unique address

c:Each employee works for only one dept.

d:A person can be the head of at most one department

e:All department heads have different names

Implicit: common sense

Employee eid ename address sex dname dhname dhsex

201314

CS3754 Class Notes #7, John Shieh

Page 15: Schema Refinement and Normal Forms

• is a superkey for relation schema R iff attri(R) where attri(R) denotes the set of all the attributes in schema R • is a candidate key (or simply, key) for R iff

- attri(R), and- is minimal, i.e., for any , attri(R)

• In other words, a candidate key is a minimal superkey

(student_ID, course_ID) is a candidate key (and the only one)(student_ID, course_ID, course_name) is a superkey, but not a candidate key(student_ID, course_ID, student_name) is another non-candidate superkey(student_ID, course_ID, course_name, student_name) is also a non-candidate

superkey

student_ID student_name course_ID course_name

111 Chan Tai Man 3170 Database222 Wong Siu Ling 3170 Database333 Tam Wai Ming 3160 Algorithms111 Chan Tai Man 3160 Algorithms

2013 15CS3754 Class Notes #7, John Shieh

Page 16: Schema Refinement and Normal Forms

1st Normal Form No repeating data groups

2nd Normal Form No partial key dependency

3rd Normal Form No transitive dependency

Boyce-Codd Normal Form Reduce keys dependency

4th Normal Form No multi-valued dependency

5th Normal Form No join dependency

Normal Forms

NFNFBCNFNFNFNF 54321

2013 16CS3754 Class Notes #7, John Shieh

Page 17: Schema Refinement and Normal Forms

CS4753/2006F 17

Normal Form (NF)

• 1NF: each attribute or column value must be atomic

• 2NF: if a schema is 1NF, and if its all attributes that are not part of the primary key are fully functionally dependent on the primary key

• 3NF: if a schema is 2NF, and all transitive dependencies have been removed

Ex: employeeDept(employeeID, name, job, deptID, deptName) has to convert to

employee(employeeID, name, job, deptID)

Dept(deptID, deptName)

Page 18: Schema Refinement and Normal Forms

CS4753/2006F 18

2NF

• It means that each non-key attribute must be functionally dependent on all parts of the primary key (i.e., the combination of the composite attributes of the key).

• Example: not 2NFEmployee(employeeID, name, job, departmentID, skill)

employeeID, skill name, job, departmentID

employeeID name, job, departmentID

(Note: determine)

• Break the table into two tables to become 2NFEmployee(employeeID, name, job, departmentID)

employeeSkills(employeeID, skill)

Page 19: Schema Refinement and Normal Forms

CS4753/2006F 19

3NF

• Example: 2NF but not 3NFEmployee(employeeID, name, job, departmentID, departmentName)

Here employeeID departmentID

employeeID departmentName

Also departmentID departmentName, departmentID is not a key

Therefore, employeeID departmentName is a transitive dependency

• Convert the schema to 3NF by breaking to two tables:Employee(employeeID, name, job, departmentID)

Department(departmentID, departmentName)

Page 20: Schema Refinement and Normal Forms

CS4753/2006F 20

Normal Forms Defined Informally

• 1st normal form– All attributes depend on the key

• 2nd normal form– All attributes depend on the whole key

• 3rd normal form– All attributes depend on nothing but the key

Page 21: Schema Refinement and Normal Forms

CS4753/2006F 21

SUMMARY OF NORMAL FORMS based on Primary Keys