106
Database Design Normal forms & Normalization Compiled by S. Z.

Database Design Normal forms & Normalization Compiled by S. Z

Embed Size (px)

Citation preview

Page 1: Database Design Normal forms & Normalization Compiled by S. Z

Database DesignNormal forms

& NormalizationCompiled by S. Z.

Page 2: Database Design Normal forms & Normalization Compiled by S. Z

Learning Objectives• Data Integrity

• Data loss vs data lossless

• Problems with bad database design: – Incomplete (easy to fix, collect all necessary attributes)– redundancy

• Attribute decomposition • Identify key attributes and functional dependencies among attributes• Normal forms:

– 1NF, – 2NF, – 3NF, – BCNF, – 4NF

• What normalization is?• How normal forms can be transformed from lower normal forms to higher normal forms• What role it plays in the database design process?• How normalization and ER modeling are used concurrently to produce a good database design?• How some situations require denormalization to generate information efficiently?

Page 3: Database Design Normal forms & Normalization Compiled by S. Z

• Table design is more challenging than manipulating tables.

Page 4: Database Design Normal forms & Normalization Compiled by S. Z

Data Integrity

• Data integrity refers to maintaining and assuring the accuracy and consistency of data over its entire life-cycle, and is a critical aspect to the design, implementation and usage of database system which stores, processes, or retrieves data.

• Data integrity is the opposite of 

– Incomplete data, data corruption, which is a form of data loss.

– Data Redundancy

Page 5: Database Design Normal forms & Normalization Compiled by S. Z

Problems with unnormalized data

• Contains redundancy• Contains multi-values, non atomic field.• Does not have a primary key identified • Cause data anomaly • Etc.

Page 6: Database Design Normal forms & Normalization Compiled by S. Z

Example of Unnormalized Data

Page 7: Database Design Normal forms & Normalization Compiled by S. Z

The idea is simple!

• This is a table design problem.

Page 8: Database Design Normal forms & Normalization Compiled by S. Z

8

Issues of bad database design

• Another Example: Company that manages building projects– Charges its clients by billing hours spent on each

contract

– Hourly billing rate is dependent on employee’s position

– Periodically, report is generated that contains information displayed in Table 5.1

Page 9: Database Design Normal forms & Normalization Compiled by S. Z

9

Issues of bad database design

Page 10: Database Design Normal forms & Normalization Compiled by S. Z

10

Issues of bad database design

Page 11: Database Design Normal forms & Normalization Compiled by S. Z

11

Issues of bad database design

• Structure of data set in Figure 5.1 does not handle data very well

• The table structure appears to work; report generated with ease

• Unfortunately, report may yield different results depending on what data anomaly has occurred

Page 12: Database Design Normal forms & Normalization Compiled by S. Z

How to measure goodness of database design?

• In the relational model, methods exist for quantifying how efficient a database is.

• These classifications are called normal forms (or NF).

12

Page 13: Database Design Normal forms & Normalization Compiled by S. Z

Normal Form

• Edgar F. Codd originally established three normal forms: 1NF, 2NF and 3NF.

• The normal forms are progressive, so to achieve Second Normal Form, the tables must already be in First Normal Form.

• 2NF is better than 1NF; 3NF is better than 2NF

Page 14: Database Design Normal forms & Normalization Compiled by S. Z

More advanced normal forms

• There are now others (BCNF, 4NF, 5NF) that are generally accepted, but 3NF is widely considered to be sufficient for most applications.

– Is out the scope of 242 course. Will be studied in 342 course.

• Most tables when reaching 3NF are also in BCNF (Boyce-Codd Normal Form).

• Highest level of normalization is not always most desirable. Also tradeoffs of various factors need to be considered. For example, tables are sometimes denormalized to yield less I/O which increases processing speed

Page 15: Database Design Normal forms & Normalization Compiled by S. Z

When do you need to check normal forms

• Tables mapped from ER model usually meets 3NF, but there is no guarantee.

• Therefore once you have a database design, with tables mapped from the constructs of ER diagram, you need to check each table for up to 3NF.

• If a table meets 3NF, usually the table is in the good shape. Otherwise, the table usually needs to be further decomposed.

Page 16: Database Design Normal forms & Normalization Compiled by S. Z

What keys are important?

• Candidate keys vs primary keys

Page 17: Database Design Normal forms & Normalization Compiled by S. Z

Dependency and partial dependency

• What is dependency?– If you look at two attributes (in a table), there are two kinds

of relationship.• Independent from each other, for example age and state in

student table.• One depends on the other, or in order words, one decide the

other. For example, ssn and age. Age depends on ssn. If you know one’s ssn, you know his/her age. This shows the dependency of age on ssn, which usually is key.

• Partial dependency (in case when the primary key consists of multiple fields.)– Fields within the table are dependent only on part of the

primary key

Page 18: Database Design Normal forms & Normalization Compiled by S. Z

Dependency vs. determinant

• A determinant is the reversed concept of dependency.

• A determinant is any attribute (simple or composite) on which some other attribute is fully functionally dependent.

Guide to Oracle 10g 18

Page 19: Database Design Normal forms & Normalization Compiled by S. Z

First normal form (1NF)

– Primary key field identified

– No multi-valued attributes, no composite attributes, i.e. each attribute is atomic, one value for each attribute.

• Applies to every relation

Page 20: Database Design Normal forms & Normalization Compiled by S. Z

Another Example Table 2

TitleTitle Author1Author1 AuthorAuthor22

ISBNISBN SubjectSubject PagesPages PublisherPublisher

Database Database System System ConceptsConcepts

Abraham Abraham SilberschatzSilberschatz

Henry Henry F. KorthF. Korth

00729588630072958863 MySQL, MySQL, ComputersComputers

11681168 McGraw-HillMcGraw-Hill

Operating Operating System System ConceptsConcepts

Abraham Abraham SilberschatzSilberschatz

Henry Henry F. KorthF. Korth

04716946650471694665 ComputersComputers 944944 McGraw-HillMcGraw-Hill

Page 21: Database Design Normal forms & Normalization Compiled by S. Z

Similar problems with Table 2

• This table is not very efficient with storage.

• This design does not protect data integrity.

• Third, this table does not scale well.

Page 22: Database Design Normal forms & Normalization Compiled by S. Z

First Normal Form

• In our Table 2, we have two violations of First Normal Form: – First, we have more than one author field, – Second, our subject field contains more than one

piece of information. With more than one value in a single field, it would be very difficult to search for all books on a given subject.

Page 23: Database Design Normal forms & Normalization Compiled by S. Z

First Normal Table

• Table 3

TitleTitle AuthorAuthor ISBNISBN SubjectSubject PagesPages PublisherPublisher

Database System Database System ConceptsConcepts

Abraham Abraham SilberschatzSilberschatz

00729588630072958863 MySQLMySQL 11681168 McGraw-HillMcGraw-Hill

Database System Database System ConceptsConcepts

Henry F. KorthHenry F. Korth 00729588630072958863 ComputersComputers 11681168 McGraw-HillMcGraw-Hill

Operating System Operating System ConceptsConcepts

Henry F. KorthHenry F. Korth 04716946650471694665 ComputersComputers 944944 McGraw-HillMcGraw-Hill

Operating System Operating System ConceptsConcepts

Abraham Abraham SilberschatzSilberschatz

04716946650471694665 ComputersComputers 944944 McGraw-HillMcGraw-Hill

Page 24: Database Design Normal forms & Normalization Compiled by S. Z

• We now have two rows for a single book. Additionally, we would be violating the Second Normal Form…

• A better solution to our problem would be to separate the data into separate tables- an Author table and a Subject table to store our information, removing that information from the Book table:

continue

Page 25: Database Design Normal forms & Normalization Compiled by S. Z

Subject_IDSubject_ID SubjectSubject

11 MySQLMySQL

22 ComputersComputers

Author_IDAuthor_ID Last NameLast Name First NameFirst Name

11 SilberschatzSilberschatz AbrahamAbraham

22 KorthKorth HenryHenry

ISBNISBN TitleTitle PagesPages PublisherPublisher

00729588630072958863 Database System Database System ConceptsConcepts

11681168 McGraw-HillMcGraw-Hill

04716946650471694665 Operating System Operating System ConceptsConcepts

944944 McGraw-HillMcGraw-Hill

Subject Table

Author Table

Book Table

Page 26: Database Design Normal forms & Normalization Compiled by S. Z

• Each table has a primary key, used for joining tables together when querying the data. A primary key value must be unique with in the table (no two books can have the same ISBN number), and a primary key is also an index, which speeds up data retrieval based on the primary key.

• Now to define relationships between the tables

Page 27: Database Design Normal forms & Normalization Compiled by S. Z

Relationships

ISBNISBN Author_IDAuthor_ID

00729588630072958863 11

00729588630072958863 22

04716946650471694665 11

04716946650471694665 22

ISBNISBN Subject_IDSubject_ID

00729588630072958863 11

00729588630072958863 22

04716946650471694665 22

Book_Author Table

Book_Subject Table

Page 28: Database Design Normal forms & Normalization Compiled by S. Z

Second normal form (2NF) Normalization (continued)

• Second normal form (2NF)– In 1NF

– No partial dependencies

Page 29: Database Design Normal forms & Normalization Compiled by S. Z

Normalization (continued)

• Basic procedure for identifying partial dependency:– Look at each field that is not part of the composite

primary key

– Make certain you are required to have both parts of the composite field to determine the value of the data element

Page 30: Database Design Normal forms & Normalization Compiled by S. Z

Second Normal Form (2NF)

• As the First Normal Form deals with redundancy of data across a horizontal row, Second Normal Form (or 2NF) deals with redundancy of data in vertical columns.

• The Book Table will be used for the 2NF example

Page 31: Database Design Normal forms & Normalization Compiled by S. Z

2NF Table

Publisher_IDPublisher_ID Publisher NamePublisher Name

11 McGraw-HillMcGraw-Hill

ISBNISBN TitleTitle PagesPages Publisher_IDPublisher_ID

00729588630072958863 Database System Database System ConceptsConcepts

11681168 11

04716946650471694665 Operating System Operating System ConceptsConcepts

944944 11

Publisher Table

Book Table

Page 32: Database Design Normal forms & Normalization Compiled by S. Z

2NF

• Here we have a one-to-many relationship between the book table and the publisher. A book has only one publisher, and a publisher will publish many books. When we have a one-to-many relationship, we place a foreign key in the Book Table, pointing to the primary key of the Publisher Table.

• The other requirement for Second Normal Form is that you cannot have any data in a table with a composite key that does not relate to all portions of the composite key.

Page 33: Database Design Normal forms & Normalization Compiled by S. Z

Bad Example

• studenttable – (student ID, course ID, course Name, grade)

• In this case, course name depends on courseID only, so called partial dependency. Thus violate 2NF.

Page 34: Database Design Normal forms & Normalization Compiled by S. Z

Third normal form (3NF)Normalization (continued)

• Third normal form (3NF)– In 2NF

– No transitive dependencies, i.e. the non – primary key attributes should be mutually independent

• Table is in 3NF when it is in 2NF and there are no transitive dependencies

• Transitive dependency – Field is dependent on another field within the table

that is not the primary key field

Page 35: Database Design Normal forms & Normalization Compiled by S. Z

Third Normal Form

• Third normal form (3NF) requires that there are no functional dependencies of non-key attributes on something other than a candidate key.

• A table is in 3NF if all of the non-primary key attributes are mutually independent

• There should not be transitive dependencies

Page 36: Database Design Normal forms & Normalization Compiled by S. Z

Bad Example

• Studenttable2– (sid, student, state, state governor)

– In this case, state depends on sid, while state governor depends on state, through which, depends on sid, so exist a transitive dependency from state governor to sid via state

– Thus violate 3NF

Page 37: Database Design Normal forms & Normalization Compiled by S. Z

• For most business database design purposes, 3NF is as high as we need to go in normalization process

• 3NF does not deal satisfactorily with the case of a relation with overlapping candidate keys, i.e. multiple composite candidate keys with at least one attribute in common.

Page 38: Database Design Normal forms & Normalization Compiled by S. Z

38

The Boyce-Codd Normal Form (BCNF)

• BCNF requires that the table is – 3NF

– and only determinants are the candidate keys

• Every determinant in table is a candidate key– Has same characteristics as primary key, but for some

reason, not chosen to be primary key

• When table contains only one candidate key, the 3NF and the BCNF are equivalent

• BCNF can be violated only when table contains more than one candidate key

Page 39: Database Design Normal forms & Normalization Compiled by S. Z

39

The Boyce-Codd Normal Form (BCNF) (continued)

• Most designers consider the BCNF as special case of 3NF, therefore, BCNF sometimes is called 3.5NF.

• Table can be in 3NF and fails to meet BCNF– No partial dependencies, nor does it contain

transitive dependencies– A nonkey attribute is the determinant of a key

attribute

Page 40: Database Design Normal forms & Normalization Compiled by S. Z

40

The Boyce-Codd Normal Form (BCNF) (continued)

Page 41: Database Design Normal forms & Normalization Compiled by S. Z

41

The Boyce-Codd Normal Form (BCNF) (continued)

Page 42: Database Design Normal forms & Normalization Compiled by S. Z

42

The Boyce-Codd Normal Form (BCNF) (continued)

Page 43: Database Design Normal forms & Normalization Compiled by S. Z

• If a relational schema is in BCNF then all redundancy based on functional dependency has been removed, although other types of redundancy may still exist.

•  

43

Page 44: Database Design Normal forms & Normalization Compiled by S. Z

44

Table in 3NF may contain multivalued dependencies that produce either numerous null values or redundant data

It may be necessary to convert 3NF table to fourth normal form (4NF) by Splitting table to remove multivalued dependencies

Fourth normal form (4NF)

Page 45: Database Design Normal forms & Normalization Compiled by S. Z

45

Fourth Normal Form (4NF)

• A relation is in 4NF if it is already in 3NF and has no multi-valued dependencies.

• Table is in fourth normal form (4NF) when both of the following are true:

– It is in 3NF – Has no multiple sets of independent (will be discussed in later slides)

multivalued dependencies (no multiple of multiple, or at most one multiple dependencies). i.e., a record type can contain at most one multi-valued facts about an entity.

• 4NF is largely academic if tables conform to following two rules:

– All attributes must be dependent on primary key, but independent of each other

– No row contains two or more multivalued facts about an entity

Page 46: Database Design Normal forms & Normalization Compiled by S. Z

Multiple Multi-valued dependency

• Multiple multi-valued dependencies exist when – There are at least three attributes A, B, and C in a

relation and

– For each value of A there is a well-defined set of values for B, and a well-defined set of values for C, but the set of values of B is independent of set C.

– Every possible combination of the two multi-valued attributes have to be stored in the database thus leading to redundancy and consequent anomalies.

46

Page 47: Database Design Normal forms & Normalization Compiled by S. Z

• This is ok, by designating all three fields combined to serve as primary key of the table.

• However, this contain multiple (two in this case) multi-value sets (instructors and textbook, with respect to course ID, respectively).

Course ID Instructor Textbook

CSCI242 Zhang Intro to MYSQL

CSCI242 Allison MYSQL

CSCI242 Zhang Oracle

CSCI242 Allison Intro to MYSQL

47

Page 48: Database Design Normal forms & Normalization Compiled by S. Z

• By splitting the above relation into two relations and placing the multi-valued attributes in each table by themselves, we can convert the above to 4NF

• Course-INST(course-ID, Instructor)• Course-TEXT(course-ID, Textbook)

48

Page 49: Database Design Normal forms & Normalization Compiled by S. Z

• Other problems caused by violating fourth normal form are similar in spirit to those mentioned earlier for violations of second or third normal form. They take different variations depending on the chosen maintenance policy:

• If there are repetitions, then updates have to be done in multiple records, and they could become inconsistent.

• Insertion of a new skill may involve looking for a record with a blank skill, or inserting a new record with a possibly blank language, or inserting multiple records pairing the new skill with some or all of the languages.

• Deletion of a skill may involve blanking out the skill field in one or more records (perhaps with a check that this doesn't leave two records with the same language and a blank skill), or deleting one or more records, coupled with a check that the last mention of some language hasn't also been deleted.

• Fourth normal form minimizes such update problems.49

Page 50: Database Design Normal forms & Normalization Compiled by S. Z

Independence

•  We mentioned independent multi-valued facts earlier, and we now illustrate what we mean in terms of the example. The two many-to-many relationships, employee:skill and employee:language, are "independent" in that there is no direct connection between skills and languages. There is only an indirect connection because they belong to some common employee. That is, it does not matter which skill is paired with which language in a record; the pairing does not convey any information. That's precisely why all the maintenance policies mentioned earlier can be allowed.

• In contrast, suppose that an employee could only exercise certain skills in certain languages. Perhaps Smith can cook French cuisine only, but can type in French, German, and Greek. Then the pairings of skills and languages becomes meaningful, and there is no longer an ambiguity of maintenance policies. In the present case, only the following form is correct:

• ------------------------------- | EMPLOYEE | SKILL | LANGUAGE | |----------+-------+----------| | Smith | cook | French | | Smith | type | French | | Smith | type | German | | Smith | type | Greek | ------------------------------- Thus the employee:skill and employee:language relationships are no longer independent. These records do not violate fourth normal form. When there is an interdependence among the relationships, then it is acceptable to represent them in a single record.

50

Page 51: Database Design Normal forms & Normalization Compiled by S. Z

4.1.2 Multivalued Dependencies

• For readers interested in pursuing the technical background of fourth normal form a bit further, we mention that fourth normal form is defined in terms of multivalued dependencies, which correspond to our independent multi-valued facts. Multivalued dependencies, in turn, are defined essentially as relationships which accept the "cross-product" maintenance policy mentioned above. That is, for our example, every one of an employee's skills must appear paired with every one of his languages. It may or may not be obvious to the reader that this is equivalent to our notion of independence: since every possible pairing must be present, there is no "information" in the pairings. Such pairings convey information only if some of them can be absent, that is, only if it is possible that some employee cannot perform some skill in some language. If all pairings are always present, then the relationships are really independent.

• We should also point out that multivalued dependencies and fourth normal form apply as well to relationships involving more than two fields. For example, suppose we extend the earlier example to include projects, in the following sense:

• An employee uses certain skills on certain projects.• An employee uses certain languages on certain projects.• If there is no direct connection between the skills and languages that an employee uses on a project, then we could treat this as two independent many-to-

many relationships of the form EP:S and EP:L, where "EP" represents a combination of an employee with a project. A record including employee, project, skill, and language would violate fourth normal form. Two records, containing fields E,P,S and E,P,L, respectively, would satisfy fourth normal form.

51

Page 52: Database Design Normal forms & Normalization Compiled by S. Z

52

Fourth Normal Form (4NF) (continued)

Page 53: Database Design Normal forms & Normalization Compiled by S. Z

53

Fourth Normal Form (4NF) (continued)

Page 54: Database Design Normal forms & Normalization Compiled by S. Z

Fifth normal form (5NF)

• Fifth normal form (5NF), also known as project-join normal form (PJ/NF) is a level of database normalization designed to reduce redundancy in relational databases recording multi-valued facts by isolating semantically related multiple relationships.

• A relation is said to be in the 5NF if and only if every non-trivial join dependency in it is implied by the candidate keys.

54

Page 55: Database Design Normal forms & Normalization Compiled by S. Z

• A join dependency *{A, B, … Z} on R is implied by the candidate key(s) of R if and only if each of A, B, …, Z is a superkey for R.

55

Page 56: Database Design Normal forms & Normalization Compiled by S. Z

1NF2NF 3NF3NF BCNF

4NF4NF 5NF5NF

Normal Forms

Page 57: Database Design Normal forms & Normalization Compiled by S. Z

Normalization certainly doesn't remove all redundancies.

Certain redundancies seem to be unavoidable, particularly when several multivalued facts are dependent rather than independent.

UNAVOIDABLE REDUNDANCIES

Page 58: Database Design Normal forms & Normalization Compiled by S. Z

The normal forms discussed here deal only with redundancies occurring within a single record type. Fifth normal form is considered to be the "ultimate" normal form with respect to such redundancies.

Other redundancies can occur across multiple record types. For the example concerning employees, departments, and locations, the following records are in third normal form in spite of the obvious redundancy:| EMPLOYEE | DEPARTMENT | | DEPARTMENT | LOCATION | | EMPLOYEE | LOCATION |

In fact, two copies of the same record type would constitute the ultimate in this kind of undetected redundancy. Beyond the scope of this course.. 58

INTER-RECORD REDUNDANCY

Page 59: Database Design Normal forms & Normalization Compiled by S. Z

59

7 CONCLUSIONWhile we have tried to present the normal forms in a simple and understandable way, we are by no means suggesting that the data design process is correspondingly simple. The design process involves many complexities which are quite beyond the scope of this paper. In the first place, an initial set of data elements and records has to be developed, as candidates for normalization. Then the factors affecting normalization have to be assessed:•Single-valued vs. multi-valued facts.•Dependency on the entire key.•Independent vs. dependent facts.•The presence of mutual constraints.•The presence of non-unique or non-singular representations.And, finally, the desirability of normalization has to be assessed, in terms of its performance impact on retrieval applications.

Page 60: Database Design Normal forms & Normalization Compiled by S. Z

60

There are algorithms for converting a given “bad” database design to increasingly better design.

Now we know how to verify whether or not a database design is conforming to normal forms.Still we want to know how to design databases to meet expectations of the the normal forms?

To judge a person is healthy is probably easier thanto make a person become healthy, to say the least in some cases!!

Page 61: Database Design Normal forms & Normalization Compiled by S. Z

Database Normalization• Database normalization is the process of removing

redundant data from your tables in to improve storage efficiency, data integrity, and scalability, through removing un-normalized relationship between the attributes in the same table.

• Normalization generally involves splitting existing tables into multiple ones, which may need to be re-joined or linked each time when any query involving the multiple tables is issued.

• If you want to normalize data, normalize at the higher level first, i.e., normalize the table, the meta data of data.

Page 62: Database Design Normal forms & Normalization Compiled by S. Z

History• Edgar F. Codd first proposed the process of normalization

and what came to be known as the 1st normal form in his paper A Relational Model of Data for Large Shared Data Banks.

• Codd stated:“There is, in fact, a very simple elimination

procedure which we shall call normalization. Through decomposition nonsimple domains are replaced by ‘domains whose elements are atomic (nondecomposable) values.’”

Page 63: Database Design Normal forms & Normalization Compiled by S. Z

Database Tables and Normalization

• Normalization

– Step-by-step process used to determine which data elements should be stored in which tables

– Purpose

• evaluate and correct table structures

• minimize data data redundancy without losing information

• Reduces data anomalies

• Multiple levels of normalization

– Works through a series of stages called normal forms:

• First normal form (1NF)

• Second normal form (2NF)

• Third normal form (3NF)

Page 64: Database Design Normal forms & Normalization Compiled by S. Z

The Normalization Process

• Each table represents a single subject• No data item will be unnecessarily stored in more

than one table• All attributes in a table are dependent on the

primary key

Page 65: Database Design Normal forms & Normalization Compiled by S. Z

65

The Normalization Process (continued)

Page 66: Database Design Normal forms & Normalization Compiled by S. Z

66

Conversion to First Normal Form

• Repeating group– Derives its name from the fact that a group of

multiple entries of same type can exist for any single key attribute occurrence

• Relational table must not contain repeating groups• Normalizing table structure will reduce data

redundancies• Normalization is three-step procedure

Page 67: Database Design Normal forms & Normalization Compiled by S. Z

67

Conversion to First Normal Form (continued)

• Step 1: Eliminate the Repeating Groups – Present data in tabular format, where each cell has

single value and there are no repeating groups

– Eliminate repeating groups, eliminate nulls by making sure that each repeating group attribute contains an appropriate data value

Page 68: Database Design Normal forms & Normalization Compiled by S. Z

68

Conversion to First Normal Form (continued)

Page 69: Database Design Normal forms & Normalization Compiled by S. Z

69

Conversion to First Normal Form (continued)

• Step 2: Identify the Primary Key – Primary key must uniquely identify attribute value

– New key must be composed

Page 70: Database Design Normal forms & Normalization Compiled by S. Z

70

Conversion to First Normal Form (continued)

• Step 3: Identify All Dependencies – Dependencies can be depicted with help of a

diagram– Dependency diagram:

• Depicts all dependencies found within given table structure

• Helpful in getting bird’s-eye view of all relationships among table’s attributes

• Makes it less likely that will overlook an important dependency

Page 71: Database Design Normal forms & Normalization Compiled by S. Z

71

Conversion to First Normal Form (continued)

Page 72: Database Design Normal forms & Normalization Compiled by S. Z

72

Conversion to First Normal Form (continued)

• First normal form describes tabular format in which:– All key attributes are defined– There are no repeating groups in the table– All attributes are dependent on primary key

• All relational tables satisfy 1NF requirements• Some tables contain partial dependencies

– Dependencies based on only part of the primary key– Sometimes used for performance reasons, but should be

used with caution– Still subject to data redundancies

Page 73: Database Design Normal forms & Normalization Compiled by S. Z

73

Conversion to Second Normal Form

• Relational database design can be improved by converting the database into second normal form (2NF)

• Two steps

Page 74: Database Design Normal forms & Normalization Compiled by S. Z

74

Conversion to Second Normal Form (continued)

• Step 1: Write Each Key Component on a Separate Line – Write each key component on separate line, then

write original (composite) key on last line

– Each component will become key in new table

Page 75: Database Design Normal forms & Normalization Compiled by S. Z

75

Conversion to Second Normal Form (continued)

• Step 2: Assign Corresponding Dependent Attributes – Determine those attributes that are dependent on

other attributes

– At this point, most anomalies have been eliminated

Page 76: Database Design Normal forms & Normalization Compiled by S. Z

76

Conversion to Second Normal Form (continued)

Page 77: Database Design Normal forms & Normalization Compiled by S. Z

77

Conversion to Second Normal Form (continued)

• Table is in second normal form (2NF) when:– It is in 1NF and

– It includes no partial dependencies:• No attribute is dependent on only portion of primary

key

Page 78: Database Design Normal forms & Normalization Compiled by S. Z

78

Conversion to Third Normal Form

• Data anomalies created are easily eliminated by completing three steps

• Step 1: Identify Each New Determinant – For every transitive dependency, write its

determinant as PK for new table• Determinant

– Any attribute whose value determines other values within a row

Page 79: Database Design Normal forms & Normalization Compiled by S. Z

79

Conversion to Third Normal Form (continued)

• Step 2: Identify the Dependent Attributes – Identify attributes dependent on each determinant

identified in Step 1 and identify dependency

– Name table to reflect its contents and function

Page 80: Database Design Normal forms & Normalization Compiled by S. Z

80

Conversion to Third Normal Form (continued)

• Step 3: Remove the Dependent Attributes from Transitive Dependencies – Eliminate all dependent attributes in transitive

relationship(s) from each of the tables that have such a transitive relationship

– Draw new dependency diagram to show all tables defined in Steps 1–3

– Check new tables as well as tables modified in Step 3 to make sure that each table has determinant and that no table contains inappropriate dependencies

Page 81: Database Design Normal forms & Normalization Compiled by S. Z

81

Conversion to Third Normal Form (continued)

Page 82: Database Design Normal forms & Normalization Compiled by S. Z

82

Conversion to Third Normal Form (continued)

• A table is in third normal form (3NF) when both of the following are true:– It is in 2NF

– It contains no transitive dependencies

Page 83: Database Design Normal forms & Normalization Compiled by S. Z

83

Improving the Design

• Table structures are cleaned up to eliminate troublesome initial partial and transitive dependencies

• Normalization cannot, by itself, be relied on to make good designs

• It is valuable because its use helps eliminate data redundancies

Page 84: Database Design Normal forms & Normalization Compiled by S. Z

84

Improving the Design (continued)

• Issues to address in order to produce a good normalized set of tables: – Evaluate PK Assignments– Evaluate Naming Conventions– Refine Attribute Atomicity– Identify New Attributes– Identify New Relationships– Refine Primary Keys as Required for Data Granularity– Maintain Historical Accuracy– Evaluate Using Derived Attributes

Page 85: Database Design Normal forms & Normalization Compiled by S. Z

85

Improving the Design (continued)

Page 86: Database Design Normal forms & Normalization Compiled by S. Z

86

Improving the Design (continued)

Page 87: Database Design Normal forms & Normalization Compiled by S. Z

87

Surrogate Key Considerations

• When primary key is considered to be unsuitable, designers use surrogate keys

• Data entries in Table 5.3 are inappropriate because they duplicate existing records– Yet there has been no violation of either entity

integrity or referential integrity

Page 88: Database Design Normal forms & Normalization Compiled by S. Z

88

Surrogate Key Considerations (continued)

Page 89: Database Design Normal forms & Normalization Compiled by S. Z

89

Normalization and Database Design

• Normalization should be part of design process

• Make sure that proposed entities meet required normal form before table structures are created

• Many real-world databases have been improperly designed or burdened with anomalies if improperly modified during course of time

• You may be asked to redesign and modify existing databases

Page 90: Database Design Normal forms & Normalization Compiled by S. Z

90

Normalization and Database Design (continued)

• ER diagram – Provides big picture, or macro view, of an

organization’s data requirements and operations

– Created through an iterative process• Identifying relevant entities, their attributes and

their relationship

• Use results to identify additional entities and attributes

Page 91: Database Design Normal forms & Normalization Compiled by S. Z

91

Normalization and Database Design (continued)

• Normalization procedures – Focus on characteristics of specific entities

– Represents micro view of entities within ER diagram

• Difficult to separate normalization process from ER modeling process

• Two techniques should be used concurrently

Page 92: Database Design Normal forms & Normalization Compiled by S. Z

92

Normalization and Database Design (continued)

Page 93: Database Design Normal forms & Normalization Compiled by S. Z

93

Normalization and Database Design (continued)

Page 94: Database Design Normal forms & Normalization Compiled by S. Z

94

Normalization and Database Design (continued)

Page 95: Database Design Normal forms & Normalization Compiled by S. Z

95

Normalization and Database Design (continued)

Page 96: Database Design Normal forms & Normalization Compiled by S. Z

96

Normalization and Database Design (continued)

Page 97: Database Design Normal forms & Normalization Compiled by S. Z

97

Denormalization

• Creation of normalized relations is important database design goal

• Processing requirements should also be a goal• If tables decomposed to conform to normalization

requirements:– Number of database tables expands

Page 98: Database Design Normal forms & Normalization Compiled by S. Z

98

Denormalization (continued)

• Joining the larger number of tables takes additional input/output (I/O) operations and processing logic, thereby reducing system speed

• Conflicts between design efficiency, information requirements, and processing speed are often resolved through compromises that may include denormalization

Page 99: Database Design Normal forms & Normalization Compiled by S. Z

99

Denormalization (continued)

• Unnormalized tables in production database tend to suffer from these defects:– Data updates are less efficient because programs

that read and update tables must deal with larger tables

– Indexing is more cumbersome

– Unnormalized tables yield no simple strategies for creating virtual tables known as views

Page 100: Database Design Normal forms & Normalization Compiled by S. Z

100

Denormalization (continued)

• Use denormalization cautiously • Understand why—under some circumstances—

unnormalized tables are better choice

Page 101: Database Design Normal forms & Normalization Compiled by S. Z

101

Summary

• Normalization is technique used to design tables in which data redundancies are minimized

• First three normal forms (1NF, 2NF, and 3NF) are most commonly encountered

• Table is in 1NF when all key attributes are defined and when all remaining attributes are dependent on primary key

Page 102: Database Design Normal forms & Normalization Compiled by S. Z

102

Summary (continued)

• Table is in 2NF when it is in 1NF and contains no partial dependencies

• Table is in 3NF when it is in 2NF and contains no transitive dependencies

• Table that is not in 3NF may be split into new tables until all of the tables meet 3NF requirements

• Normalization is important part—but only part—of design process

Page 103: Database Design Normal forms & Normalization Compiled by S. Z

103

Summary (continued)

Page 104: Database Design Normal forms & Normalization Compiled by S. Z

104

Summary (continued)

Page 105: Database Design Normal forms & Normalization Compiled by S. Z

105

Summary (continued)

Page 106: Database Design Normal forms & Normalization Compiled by S. Z

References

106