Upload
srikanta21
View
289
Download
1
Embed Size (px)
DESCRIPTION
This material is not copy righted.
Citation preview
THE UNIVERSITY OF TEXAS AT AUSTIN
SCHOOL OF INFORMATION
LIS 384K.11 (known as INF 385M, beginning with the Fall
Semester 2003)
DATABASE-MANAGEMENT PRINCIPLES AND
APPLICATIONS
R. E. Wyllys
Steps in Normalization
Contents: Section 1. Introduction
Section 2. Summary of Definitions of the Normal Forms
Section 3. Functional Dependency and Determinants
Section 4. The 1st Normal Form (1NF)
Section 5. The 2nd Normal Form (2NF)
Section 6. Anomalies and Normalization
Section 7. Turning a Table with Anomalies into Single-Theme Tables
Section 8. The 3rd Normal Form (3NF)
Section 9. The Boyce-Codd Normal Form (BCNF)
Section 10. The 4th Normal Form (4NF)
Section 11. The 5th Normal Form (5NF) and the Domain-Key Normal Form (DKNF)
Section 11.1. Converting a Table with Partial Dependencies into DKNF Tables
Section 11.2. Converting a Table with Transitive Dependencies into DKNF Tables
Section 11.3. Converting into DKNF a Table in Which Not Every Determinant Is a Candidate
Key
Section 11.4. Converting a Table with Multivalued Dependencies into DKNF
Section 11.5. Single-Theme Tables and the DKNF
Section 1. Introduction
This handout discusses the normalization of databases. Our goal here is to explain, and to
illustrate the need for, the various normal forms through examples of sets of relations. The
relations in the examples present various difficulties, which are removed by procedures
stemming from the relevant definitions of normal forms.
Note: This lesson presents a detailed discussion of normalization. For a simple introduction to
the ideas of normalization, one source is my lesson entitled Overview of Normalization.
Section 2. Summary of Definitions of the Normal Forms
1st Normal Form (1NF)
Definition: A table (relation) is in 1NF if
1. There are no duplicated rows in the table.
2. Each cell is single-valued (i.e., there are no repeating groups or arrays).
3. Entries in a column (attribute, field) are of the same kind.
Note: The order of the rows is immaterial; the order of the columns is immaterial.
Note: The requirement that there be no duplicated rows in the table means that the table has a
key (although the key might be made up of more than one column--even, possibly, of all the
columns).
2nd Normal Form (2NF)
Definition: A table is in 2NF if it is in 1NF and if all non-key attributes are dependent on all
of the key.
Note: Since a partial dependency occurs when a non-key attribute is dependent on only a part of
the (composite) key, the definition of 2NF is sometimes phrased as, "A table is in 2NF if it is in
1NF and if it has no partial dependencies."
3rd Normal Form (3NF)
Definition: A table is in 3NF if it is in 2NF and if it has no transitive dependencies.
Boyce-Codd Normal Form (BCNF)
Definition: A table is in BCNF if it is in 3NF and if every determinant is a candidate key.
4th Normal Form (4NF)
Definition: A table is in 4NF if it is in BCNF and if it has no multi-valued dependencies.
5th Normal Form (5NF)
Definition: A table is in 5NF, also called "Projection-Join Normal Form" (PJNF), if it is in
4NF and if every join dependency in the table is a consequence of the candidate keys of the
table.
Domain-Key Normal Form (DKNF)
Definition: A table is in DKNF if every constraint on the table is a logical consequence of
the definition of keys and domains.
Section 3. Functional Dependency and Determinants
Before we develop the ideas of normalization further, it is important for you to have an
understanding of "functional dependency." The essence of this idea is that if the existence of
something, call it A, implies that B must exist and have a certain value, then we say that "B is
functionally dependent on A." We also often express this idea by saying that "A determines B,"
or that "B is a function of A," or that "A functionally governs B." Often, the notions of
functionality and functional dependency are expressed briefly by the statement, "If A, then B." It
is important to note that the value B must be unique for a given value of A, i.e., any given value
of A must imply just one and only one value of B, in order for the relationship to qualify for the
name "function." (However, this does not necessarily prevent different values of A from
implying the same value of B.)
For the terminology of relational databases, the word "function" was borrowed from
mathematics, where it is common to say things like "y is a function of x" or "y = f(x)". (The
latter expression is read "y equals f of x".) The determining value, x, is called the argument; the
determined value, y or f(x), is called the result.
The expression "y = f(x)" is a very general, and abstract, way of talking about functionality.
Outside of mathematics--and, in particular, ordinarily in relational database management--we
talk not abstractly but in terms of particular examples. (Indeed, the general idea of a "function" is
best understood when one has seen enough examples of specific functions to be able to start
generalizing about the abstract, or general, properties that the specific functions share.)
Here are some examples of functions. An easy one is y = x2. This particular function says that if
we are given a particular value for x, say 3, then we must say that y has the value 9. (We could
also write y = f(x) = x2 or just f(x) = x
2.) Another easy one is: y = x
3. This particular function
says that if we are given a particular value for x, say -2, then we must say that y has the value -8.
A common way of indicating functions is to place the determining and determined values side by
side in a table. Thus we can place sample values of the function, y = x2, in a table like the one
shown here.
Value of x
("argument," or
Value of y = x2
("the
function," or
This table shows just three of the infinity of possible pairs of
values, x and y, for the function y = x2. It also shows that for
some functions, different values of x (here, 3 and -3) imply the
same value (here, 9) of the function.
The functions we have given as examples so far have been
functions that are specified by an algebraic function. But the
idea of function is more general; i.e., functions need not be
algebraically defined. The essence of the idea of function is
that to a specified determining value corresponds a unique
determined value. This essence can be defined, among other
ways, by placing the determining and determined values in a table that displays and/or defines
the relationship between the argument and the result.
Note that the table above displays, but does not fully define, the relationship, y = x2. This
function, since it has an infinite number of pairs of values, cannot be fully defined in a table. For
functions that involve only a finite number of pairs of values of argument and result, a table is
often a convenient way--and may in fact be the only way--of displaying and, at the same time,
defining the function.
Here is a simple example of a finite function that is both displayed and defined in a table. Most
of you will be familiar with the conventional (though often delightfully breakable) rules for
serving different types of wines with different courses in a dinner. Let us assume for the purpose
of this example that these rules can be summarized as follows: with meat, serve red wine; with
fish, white wine; and with cheese, ros� wine. Then the following table defines the course-wine
function:
But note that this table looks just like a database table. In fact,
there is no reason not to consider it a database table. Indeed,
this table defines a relation in the database sense: it has
columns, each of which contains entries of the same kind, and
it has no duplicate rows. In other words, not only does the
course-wine table display the data about the conventional
rules for which wine to serve with which course, but also the
table can be viewed as defining a function for which the
determining value is the dinner course and the determined
value is the type of wine. Thus we can say that type of wine is
functionally dependent on the dinner course, or equally well,
that the course determines the wine.
In relational database terminology, we often call the argument of the function (the dinner course
in this example) the "determinant", and we often use an arrow notation to exhibit the functional
dependency. Thus, we can say that the dinner course is the determinant of the type of wine, and
we can write: dinner course wine. And we can say that the attribute, type of wine, is
functionally dependent on the attribute, dinner course.
"A") "the result",
or "B")
3 9
4 16
-3 9
Dinner Course Type of Wine
meat red
fish white
cheese ros�
In general, a functional dependency is a relationship among attributes. In relational databases, we
can have a determinant that governs one other attribute or several other attributes. To go back to
our mathematical examples for a moment, we could view the situation of functional dependency
of several attributes on one determinant as being like having several linked functions that share
an argument and can be displayed economically in just one table. For example, consider the
following table that displays sample values of the algebraic functions y = x2, y = x
3, and y = x
4.
Looking at this table from the relational-
database point of view, we can say that
the attributes x2, x
3, and x
4 are all
functionally dependent on the attribute
x.
Similarly, we could expand the dinner-course and wine table to exhibit also the type of cutlery
that would be appropriate in the case of a formal dinner.
From this table we see that the attributes,
type of wine and type of cutlery, are
functionally dependent on the attribute,
dinner course.
Using the arrow notation, we have:
dinner course wine
and
dinner course cutlery.
Section 4. The 1st Normal Form (1NF)
Now we ready to come to grips with the ideas of normalization. The following table, containing
information about some students at Enormous State University, is a table that is in 1st Normal
Form, 1NF. (Here as elsewhere in the rest of this discussion, you may want to refer back to
Section 2. Summary of Definitions of the Normal Forms, where the various normal forms are
defined.)
Table 4.1
Value of x Value of x2 Value of x3 Value of x4
3 9 27 81
4 16 64 256
-3 9 -27 81
Dinner Course Type of Wine Type of Cutlery
meat red meat fork
fish white fish fork
cheese ros� cheese fork
Social Security FirstName LastName Major
You can easily verify
for yourself that this
table satisfies the
definition of 1NF:
viz., it has no
duplicated rows; each
cell is single-valued
(i.e., there are no
repeating groups or
arrays); and all the
entries in a given
column are of the
same kind.
In Table 4.1 we can
see that the key, SSN,
functionally
determines the other attributes; i.e., a given Social Security Number implies (determines) a
particular value for each of the attributes FirstName, LastName, and Major (assuming, at least
for the moment, that a student is allowed to have only one major). In the arrow notation: SSN
FirstName, SSN LastName, and SSN Major.
A key attribute will, by the definition of key, uniquely determine the values of the other
attributes in a table; i.e., all non-key attributes in a table will be functionally dependent on the
key. But there may be non-key attributes in a table that determine other attributes in that table.
Consider the following table:
Table 4.2
In Table 4.2 the Level attribute can
be said to be functionally
dependent on the Major attribute.
Thus we have an example of an
attribute that is functionally
dependent on a non-key attribute.
This statement is true in the table
per se, and that is all that the
definition of functional dependence
requires; but the statement also
reflects the real-world fact that
Library and Information Science is
a major that is open only to
graduate students and that Pre-
Medicine and Pre-Law are majors
that are open only to undergraduate
Number
123-45-6789 Jack Jones Library and Information Science
222-33-4444 Lynn Lee Library and Information Science
987-65-4321 Mary Ruiz Pre-Medicine
123-54-3210 Lynn Smith Pre-Law
111-33-5555 Jane Jones Library and Information Science
FirstName LastName Major Level
Jack Jones LIS Graduate
Lynn Lee LIS Graduate
Mary Ruiz Pre-Medicine Undergraduate
Lynn Smith Pre-Law Undergraduate
Jane Jones LIS Graduate
students.
Section 5. The 2nd Normal Form (2NF)
Table 4.2 has another interesting aspect. Its key is a composite key, consisting of the paired
attributes, FirstName and LastName. The Level attribute is functionally dependent on this
composite key, of course; but, in addition, Level can be seen to be dependent on only the
attribute LastName. (This is true because each value of Level is paired with a distinct value of
LastName. In contrast, there are two occurrences of the value Lynn for the attribute FirstName,
and the two Lynns are paired with different values of Level, so Level is not functionally
dependent on FirstName.) Thus this table fails to qualify as a 2nd Normal Form table, since the
definition of 2NF requires that all non-key attributes be dependent on all of the key. (Admittedly,
this example of a partial dependency is artificially contrived, but nevertheless it illustrates the
problem of partial dependency.)
We can turn Table 4.2 into a table in 2NF in an easy way, by adding a column for the Social
Security Number, which will then be the natural thing to use as the key.
Table 5.1
SSN FirstName LastName Major Level
123-45-6789 Jack Jones LIS Graduate
222-33-4444 Lynn Lee LIS Graduate
987-65-4321 Mary Ruiz Pre-Medicine Undergraduate
123-54-3210 Lynn Smith Pre-Law Undergraduate
111-33-5555 Jane Jones LIS Graduate
With the SSN defined as the key, Table 5.1 is in 2NF, as you can easily verify. This illustrates
the fact that any table that is in 1NF and has a single-attribute (i.e., a non-composite) key is
automatically also in 2NF.
Table 5.1 still exhibits some problems, however. For example, it contains some repeated
information about the LIS-Graduate pairing.
Section 6. Anomalies and Normalization
At this point it is appropriate to note that the main thrust behind the idea of normalizing
databases is the avoidance of insertion and deletion anomalies in databases.
To illustrate the idea of anomalies, consider what would happen to our knowledge (at least, as
explicitly contained in a table) of the level of the major, Pre-Medicine, if Mary Ruiz left
Enormous State University. With the deletion of the row for Ms. Ruiz, we would lose the
information that Pre-Medicine is an Undergraduate major. This is an example of a deletion
anomaly. We may possess the real-world information that Pre-Medicine is an Undergraduate
major, but no such information is explicitly contained in a table in our database.
As an example of an insertion anomaly, we can suppose that a new student wants to enroll in
ESU: e.g., suppose Jane Doe wants to major in Public Affairs. From the information in Table 5.1
we cannot tell whether Public Affairs is an Undergraduate or a Graduate major; in fact, we do
not even know whether Public Affairs is an established major at ESU. We do not know whether
it is permissible to insert the value, Public Affairs, as a value of the attribute, Major, or what to
insert for the attribute, Level, if we were to assume that Public Affairs is a valid value for Major.
The point is that while we may possess real-world information about whether Public Affairs is a
major at ESU and what its level is, this information is not explicitly contained in any table that
we have thus far mentioned as part of our database.
A database-management system, a DBMS, can work only with the information that we put
explicitly into its tables for a given database and into its rules for working with those tables,
where such rules are appropriate and possible.
How do anomalies relate to normalization? The simple answer is that by arranging that the tables
in a database are sufficiently normalized (in practice, this typically means to at least the 4th level
of normalization), we can ensure that anomalies will not arise in our database. Anomalies are
difficult to avoid directly, because with databases of typical complexity (i.e., several tables) the
database designer can easily overlook possible problems. Normalization offers a rigorous way of
avoiding unrecognized anomalies.
Normalization may look like a difficult process when one views it from the standpoint of the
formal definitions of the various normal forms, as presented in Section 2 of this handout. But in
practice, you can easily attain sufficient normalization in your database by simply ensuring that
the tables in your database are what we can call "single-theme" tables. This idea will be
illustrated as we proceed through the rest of the discussion in this handout.
Section 7. Turning a Table with Anomalies (Table 5.1) into Single-Theme Tables
Although Table 5.1 is in 2NF, it is still open to the problems of insertion and deletion anomalies,
as the discussion in the preceding section shows. The reason is that Table 5.1 deals with more
than a single theme. What can we do to turn it into a set of tables that are, or at least come closer
to being, single-theme tables?
A reasonable way to proceed is to note that Table 5.1 deals with both information about students
(their names and SSNs) and information about majors and levels. This should strike you as two
different themes. Presented below is one possible set of single-theme tables dealing with the
information in Table 5.1. (To save space, the following tables also contain some information that
is not in Table 5.1, and the discussion will deal with this added information.)
Table 7.1
SSN FirstName LastName
123-45-6789 Jack Jones
222-33-4444 Lynn Lee
987-65-4321 Mary Ruiz
123-45-4321 Lynn Smith
111-33-5555 Jane Jones
999-88-7777 Newton Gingpoor
Table 7.2
Major Level
LIS Graduate
Pre-Medicine Undergraduate
Pre-Law Undergraduate
Public Affairs Graduate
Table 7.3
SSN Major
123-45-6789 LIS
222-33-4444 LIS
987-65-4321 Pre-Medicine
123-54-3210 Pre-Law
111-33-5555 LIS
The three preceding tables should strike you as providing a better arrangement of the information
in Table 5.1. For one thing, this arrangement puts the information about the students into a
smaller table, Table 7.1, which happily fails to contain redundant information about the LIS-
Graduate pairing. For another thing, this arrangement permits us to enter information about
students (e.g., Newton Gingpoor) who have not yet identified themselves as pursuing a particular
major. For still another thing, it puts the information about the Major-Level pairings into a
separate table, Table 7.2, which can easily be expanded to include information (e.g., that the
Public Affairs major is at the Graduate level) about majors for which, at the moment, there may
be no students registered. Finally, Table 7.3 provides the needed link between individual students
and their majors (note that Newton Gingpoor's SSN is not in this Table 7.3, which tells us that he
has not yet selected a major).
Tables 7.1 - 7.3 are single-theme tables and are in 2NF, as you can easily verify. (In fact, they are
in DKNF, but we are not yet ready to discuss the latter level in detail.)
Section 8. The 3rd Normal Form (3NF)
In order to discuss the 3rd Normal Form, we need to begin by discussing the idea of transitive
dependencies.
In mathematics and logic, a transitive relationship is a relationship of the following form: "If A
implies B, and if also B implies C, then A implies C." An example is: "If John Doe is a human,
and if every human is a primate, then John Doe must be a primate." Another way of putting it is
this: "If A functionally governs B, and if B functionally governs C, then A functionally governs
C." In the arrow notation, we have:
[(A B) and (B C)] (A C)
The following table, Table 8.1, provides an example of how transitive dependencies can occur in
a table in a relational database.
Table 8.1
Author
Last
Name
Author
First
Name
Book Title Subject Collection or Library Building
Berdahl Robert The Politics of the Prussian
Nobility
History PCL General Stacks Perry-Casta�eda
Library
Yudof Mark Child Abuse and Neglect Legal
Procedures
Law Library Townes Hall
Harmon Glynn Human Memory and
Knowledge
Cognitive
Psychology
PCL General Stacks Perry-Casta�eda
Library
Graves Robert The Golden Fleece Greek
Literature
Classics Library Waggener Hall
Miksa Francis Charles Ammi Cutter Library
Biography
Library and
Information Science
Collection
Perry-Casta�eda
Library
Hunter David Music Publishing and
Collecting
Music
Literature
Fine Arts Library Fine Arts Building
Graves Robert English and Scottish Ballads Folksong PCL General Stacks Perry-Casta�eda
Library
By examining Table 8.1 we can infer that books dealing with history, cognitive psychology, and
folksong are assigned to the PCL General Stacks collection; that books dealing with legal
procedures are assigned to the Law Library; that books dealing with Greek literature are assigned
to the Classics Library; that books dealing with library biography are assigned to the Library and
Information Science Collection (LISC);and that books dealing with music literature are assigned
to the Fine Arts Library.
Further, we can infer that the PCL General Stacks collection and the LISC are both housed in the
Perry-Casta�eda Library (PCL) building; that the Classics Library is housed in Waggener Hall;
and that the Law Library and Fine Arts Library are housed, respectively, in Townes Hall and the
Fine Arts Building.
Thus we see that there is a transitive dependency in Table 8.1: any book that deals with history,
cognitive psychology, or library biography will be physically housed in the PCL building (unless
it is temporarily checked out to a borrower); any book dealing with legal procedures will be
housed in Townes Hall; and so on. In short, if we know what subject a book deals with, we also
know not only what library or collection it will be assigned to but also what building it is
physically housed in.
What is wrong with having a transitive dependency or dependencies in a table? For one thing,
there is duplicated information: from three different rows we can see that the PCL General
Stacks are in the PCL building. For another thing, we have possible deletion anomalies: if the
Yudof book were lost and its row removed from Table 8.1, we would lose the information that
books on legal procedures are assigned to the Law Library and also the information the Law
Library is in Townes Hall. As a third problem, we have possible insertion anomalies: if we
wanted to add a chemistry book to the table, we would find that Table 8.1 nowhere contains the
fact that the Chemistry Library is in Robert A.Welch Hall. As a fourth problem, we have the
chance of making errors in updating: a careless data-entry clerk might add a book to the LISC
but mistakenly enter Townes Hall in the building column.
The solution to the problem is, once again, to place the information in Table 8.1 into appropriate
single-theme tables. Here is one such possible arrangement:
Table 8.2
Author
Last
Name
Author
First
Name
Book Title
Berdahl Robert The Politics of the Prussian Nobility
Yudof Mark Child Abuse and Neglect
Harmon Glynn Human Memory and Knowledge
Graves Robert The Golden Fleece
Miksa Francis Charles Ammi Cutter
Hunter David Music Publishing and Collecting
Graves Robert English and Scottish Ballads
Table 8.3
Book Title Subject
The Politics of the Prussian Nobility History
Child Abuse and Neglect Legal Procedures
Human Memory and Knowledge Cognitive Psychology
The Golden Fleece Greek Literature
Charles Ammi Cutter Library Biography
Music Publishing and Collecting Music Literature
English and Scottish Ballads Folksong
Table 8.4
Subject Collection or Library
History PCL General Stacks
Legal Procedures Law Library
Cognitive Psychology PCL General Stacks
Greek Literature Classics Library
Library Biography Library and Information Science Collection
Music Literature Fine Arts Library
Folksong PCL General Stacks
Table 8.5
Collection or Library Building
PCL General Stacks Perry-Casta�eda Library
Law Library Townes Hall
Classics Library Waggener Hall
Library and Information Science Collection Perry-Casta�eda Library
Fine Arts Library Fine Arts Building
You can verify for yourself that none of these tables contains a transitive dependency; hence, all
of them are in 3NF (and, in fact, in DKNF).
We can note in passing that the fact that Table 8.2 contains the first and last names of Robert
Graves in two different rows suggests that it might be worthwhile to replace it with two further
tables, along the lines of:
Table 8.6
Author Last
Name
Author
First
Name
Author
Identification
Number
Berdahl Robert 001
Yudof Mark 002
Harmon Glynn 003
Graves Robert 004
Miksa Francis 005
Hunter David 006
Table 8.7
Author
Identification
Number
Book Title
001 The Politics of the Prussian Nobility
002 Child Abuse and Neglect
003 Human Memory and Knowledge
004 The Golden Fleece
005 Charles Ammi Cutter
006 Music Publishing and Collecting
004 English and Scottish Ballads
Though Tables 8.6 and 8.7 together take a little more space than Table 8.2, it is easy to see that
given a much larger collection, in which there would be many more authors with multiple works
to their credit, Tables 8.6 and 8.7 would be more economical of storage space than Table 8.2.
Furthermore, the structure of Tables 8.6 and 8.7 lessens the chance of making updating errors
(e.g., typing Grave instead of Graves, or Miska instead of Miksa).
Section 9. The Boyce-Codd Normal Form (BCNF)
The Boyce-Codd Normal Form (BCNF) deals with the anomalies that can occur when a table
fails to have the property that every determinant is a candidate key. Here is an example, Table
9.1, that fails to have this property. (In Table 9.1 the SSNs are to be interpreted as those of
students with the stated majors and advisers. Note that each of students 123-45-6789 and 987-
65-4321 has two majors, with a different adviser for each major.)
Table 9.1
We begin by showing that
Table 9.1 lacks the required
property, viz., that every
determinant be a candidate
key.
What are the determinants in
Table 9.1? One determinant is
the pair of attributes, SSN
and Major. Each distinct pair
of values of SSN and Major
determines a unique value for
the attribute, Adviser.
Another determinant is the
pair, SSN and Adviser, which
determines unique values of
the attribute, Major. Still
another determinant is the
attribute, Adviser, for each
different value of Adviser
determines a unique value of
the attribute, Major. (These
observations about Table 9.1 correspond to the real-world facts that each student has a single
adviser for each of his or her majors, and each adviser advises in just one major.)
Now we need to examine these three determinants with respect to the question of whether they
are candidate keys. The answer is that the pair, SSN and Major, is a candidate key, for each such
pair uniquely identifies a row in Table 9.1. In similar fashion, the pair, SSN and Adviser, is a
candidate key. But the determinant, Adviser, is not a candidate key, because the value Dewey
occurs in two rows of the Adviser column. So Table 9.1 fails to meet the condition that every
determinant in it be a candidate key.
It is easy to check on the anomalies in Table 9.1. For example, if student 987-65-4321 were to
leave Enormous State University, the table would lose the information that Semmelweis is an
adviser for the Pre-Medicine major. As another example, Table 9.1 has no information about
advisers for students majoring in history.
SSN Major Adviser
123-45-6789 Library and Information Science Dewey
123-45-6789 Public Affairs Roosevelt
222-33-4444 Library and Information Science Putnam
555-12-1212 Library and Information Science Dewey
987-65-4321 Pre-Medicine Semmelweis
987-65-4321 Biochemistry Pasteur
123-54-3210 Pre-Law Hammurabi
As usual, the solution lies in constructing single-theme tables containing the information in Table
9.1. Here are two tables that will do the job.
Table 9.2
SSN Adviser
123-45-
6789
Dewey
123-45-
6789
Roosevelt
222-33-
4444
Putnam
555-12-
1212
Dewey
987-65-
4321
Semmelweis
987-65-
4321
Pasteur
123-54-
3210
Hammurabi
Table 9.3
Major Adviser
Library and Information Science Dewey
Public Affairs Roosevelt
Library and Information Science Putnam
Pre-Medicine Semmelweis
Biochemistry Pasteur
Pre-Law Hammurabi
History Herodotus
By way of an example of the value of separating Table 9.1 into single-theme tables, Table 9.3
includes information about at least one faculty member at ESU who could be the adviser of a
student who wanted to major in history.
Tables 9.2 and 9.3 are in BCNF (in fact, they are in DKNF), since every determinant in them is
also a candidate key. You can easily verify this statement if you note that the key in Table 9.2 is
a composite key, SSN and Adviser.
Section 10. The 4th Normal Form (4NF)
The 4th Normal Form is concerned with the anomalies that can occur when a table fails to have
the property of containing no multivalued dependencies (i.e., the anomalies that can occur when
a table does have such dependencies). We develop below a table that has these undesirable
multivalued dependencies.
Suppose we have some information about the hobbies of some students at Enormous State
University and want to put this information into a database. Suppose, in particular, that Jack
Jones's hobbies are surfing the Internet and playing chess; Lynn Lee's, photography and stamp
collecting; Mary Ruiz's, surfing the Internet and photography; and Lynn Smith's, playing poker.
If we (foolishly) try to put all this information into just one table, here is what we get.
Table 10.1
LastName Major Hobby
Jones Library and Information Science Surfing the Internet
Jones Library and Information Science Chess
The problem is that Jack
Jones, for example, has two
majors and two hobbies. If
we coupled each of his
majors with just one of his
hobbies (e.g., LIS with
chess, or Public Affairs with
surfing the Internet), we
would imply that Jack plays
chess only as an LIS major
and surfs the Internet only as
a Public Affairs major. This
would not make sense. (Note
that in this relatively small
and simple example, it is
obvious that such restrictive
pairing does not make sense.
In practice, however, the
problems arise in connection
with much larger tables,
where it may be very
difficult to detect that
restrictive pairing has
occurred.) To avoid such
false implications, we enter
all pairings of majors and hobbies for all the students. Obviously, however, this approach has the
problem of redundant information. Equally obviously, updating this table presents anomalies; for
example, you can work out for yourself what would have to be added to Table 10.1 if Jones took
up tennis as a third hobby.
This situation is an example of the effects of multivalued dependencies. A multivalued
dependency occurs when (a) a table has at least three attributes, (b) two of the attributes are
multivalued, and (c) the values of the multivalued attributes depend only one of the remaining
attributes. Table 10.1 fits these specifications for the following reasons: The LastName attribute
determines multiple values of the attributes Major and Hobby, but neither of these latter
attributes depends on the other; they are independent.
The notation for multivalued dependency is a double arrow. In this example, we can write:
LastName Major, and LastName Hobby. We read these expressions as, "LastName
multidetermines Major" and "LastName multidetermines Hobby."
Once again, single-theme tables provide the solution. We break Table 10.1 down into the
following tables.
Table 10.2
Jones Public Affairs Surfing the Internet
Jones Public Affairs Chess
Lee Library and Information Science Photography
Lee Library and Information Science Stamp collecting
Ruiz Pre-Medicine Surfing the Internet
Ruiz Pre-Medicine Photography
Ruiz Biochemistry Surfing the Internet
Ruiz Biochemistry Photography
Smith Pre-Law Playing poker
LastName Major
Jones Library and Information Science
Jones Public Affairs
Lee Library and Information Science
Ruiz Pre-Medicine
Ruiz Biochemistry
Smith Pre-Law
Table 10.3
LastName Hobby
Jones Surfing the Internet
Jones Chess
Lee Photography
Lee Stamp collecting
Ruiz Surfing the Internet
Ruiz Photography
Smith Playing poker
Tables 10.2 and 10.3 display, separately, the various students' majors and hobbies; and while
doing so, these tables correctly avoid suggesting any connections between particular majors and
particular hobbies.
Section 11. The 5th Normal Form (5NF) and the Domain-Key Normal Form
(DKNF)
The 5th Normal Form is difficult to illustrate in terms of relatively simple examples. Hence, we
will not attempt to illustrate the 5NF property of having every join dependency in the table be a
consequence of the candidate keys of the table. This omission is a minor one, for at least two
reasons: First, in practice the 4NF is often regarded as sufficient; and second, the Domain-Key
Normal Form (DKNF) subsumes the 5NF.
The DKNF is important because it offers a complete solution to the problem of avoiding
anomalies: A set of tables (relations) that is in DKNF is known, as a consequence of a theorem
proved by Ronald Fagin in 1981, to be free of anomalies. We do not attempt here to reproduce
the proof of Fagin's theorem but merely to illustrate how the theorem can be applied in practice.
The DKNF definition is this: A relation is in DKNF if every constraint on the relation is a logical
consequence of the definitions of keys and domains. To understand what this definition means,
we begin by noting that the central ideas are embodied in the words "constraint," "key," and
"domain." By "key" Fagin means both primary keys and candidate keys. By "domain" Fagin
means the set of definitions of the contents of attributes (columns) and any limitations on the
kind of data to be stored in the columns, such as a limitation to only numeric data or only logical
data; in addition, domain limitations may include such matters as the format (e.g., a limitation on
numeric data to being expressed to exactly two decimal digits). By "constraint" Fagin means any
rule dealing with attributes that is clear enough so that one can decide whether the rule is upheld
or broken by any set of the data with which one is dealing.
There is an important qualification to be attached to the DKNF definition as presented in the
preceding paragraph. Fagin excludes constraints that are time-dependent or relate to changes
made in data values. That means that a time-dependent constraint (or other constraint on changes
in value) may exist in a table and may fail to be a logical consequence of the definitions of keys
and domains, yet the table may nevertheless be in DKNF.
As an illustration, some states have a property-tax rule specifying that the assessed value of the
primary-residence property owned by a citizen over 65 cannot be increased above the value that
was assessed in the year in which the property owner turned 65. The existence of such a rule
would not, in itself, prevent a table of properties and their assessed values from being in DKNF.
Achieving DKNF amounts to establishing a set of tables in each of which the constraints follow
logically from (i.e., are logical consequences of) the keys and the domain definitions. Although
there is no direct procedure for converting an arbitrary table into one or more tables each of
which is in DKNF, in practice the effort to replace an arbitrary table by a set of single-theme
tables achieves the goal. To show this, we consider some of the previous examples from the
DKNF point of view.
Section 11.1. Converting a Table with Partial Dependencies into DKNF Tables
Here once again is the table, Table 4.2, that we used in our discussion of the problem of partial
dependencies. Since we going to use it here, we name this copy of it Table 11.1.1.
Table 11.1.1
Let us consider Table 11.1.1 from the
DKNF point of view. First, we see that
the key is composite, consisting of the
LastName-FirstName pair of attributes.
We see also that all other attributes in
the table are dependent on this key. But
there is another significant aspect to
this table: the Level attribute is
dependent on the LastName attribute,
i.e., Level is dependent on just part of
the key. (As noted earlier, this partial
dependency is contrived, but
nevertheless it illustrates the problem
of partial dependency.) Because Level
is dependent on just LastName, the
table fails to be one in which all
constraints are logical consequences of
the key; hence, Table 11.1 is not in DKNF.
From the DKNF point of view, therefore, we see that we should take the Level attribute out of
Table 11.1.1 and put it in some other table, or tables, where it will be a logical consequence of
the keys and domains. Clearly, a table that associates just the attributes Major and Level will
achieve this.
We will also need a table that provides the necessary link between the paired attributes,
FirstName and LastName, and the attribute Major. In such a table, the attribute Major will be a
logical consequence of the keys and domains.
Thus it appears that we need two tables, one containing just Major and Level, and the other
containing FirstName, LastName, and Major. We can indicate this more briefly as Table A:
(Major, Level) and Table B: (FirstName, LastName, Major).
Here are the tables.
Table 11.1.2 (Table A as described above)
Major Level
FirstName LastName Major Level
Jack Jones LIS Graduate
Lynn Lee LIS Graduate
Mary Ruiz Pre-Medicine Undergraduate
Lynn Smith Pre-Law Undergraduate
Jane Jones LIS Graduate
LIS Graduate
Pre-Medicine Undergraduate
Pre-Law Undergraduate
Table 11.1.3 (Table B as described above)
FirstName LastName Major
Jack Jones LIS
Lynn Lee LIS
Mary Ruiz Pre-Medicine
Lynn Smith Pre-Law
Jane Jones LIS
These are single-theme tables, and we arrived at them by steps aimed at achieving DKNF.
Section 11.2. Converting a Table with Transitive Dependencies into DKNF Tables
Here once again is the table, Table 8.1, that we used in our discussion of transitive dependencies.
Since we going to use it here, we name this copy of it Table 11.2.1.
Table 11.2.1
Author
Last
Name
Author
First
Name
Book Title Subject Collection or Library Building
Berdahl Robert The Politics of the
Prussian Nobility
History PCL General Stacks Perry-Casta�eda
Library
Yudof Mark Child Abuse and Neglect Legal
Procedures
Law Library Townes Hall
Harmon Glynn Human Memory and
Knowledge
Cognitive
Psychology
PCL General Stacks Perry-Casta�eda
Library
Graves Robert The Golden Fleece Greek
Literature
Classics Library Waggener Hall
Miksa Francis Charles Ammi Cutter Library
Biography
Library and
Information Science
Collection
Perry-Casta�eda
Library
Hunter David Music Publishing and
Collecting
Music
Literature
Fine Arts Library Fine Arts Building
Graves Robert English and Scottish
Ballads
Folksong PCL General Stacks Perry-Casta�eda
Library
You will recall from the discussion of this table as Table 8.1 that it exhibits the following
transitive dependencies: Book Title Subject, Subject Collection-Library, and Collection-
Library Building. From the DKNF point of view, this means that the primary key, Book Title,
is not the only thing that determines the Collection-Library attribute and the Building attribute.
In turn, this means that there are constraints that are not logical consequences of the key and,
hence, that the table is not in DKNF.
Reasoning from the DKNF point of view, we would like to have a table in which the Building
attribute is a logical consequence of the key; constructing a table containing the Collection-
Library and Building attributes, with Collection-Library as key, will accomplish that. Again from
the DKNF point of view, we would like to have a table in which the Collection-Library attribute
is a logical consequence of the key; clearly, a table containing Subject (as key) and Collection-
Library suffices. The same point of view leads us to desire a table in which the Author First
Name and Author Last Name attributes will be a logical consequence of the key; such a table is
one that contains Book Title (as key), Author First Name, and Author Last Name. Finally, a table
that contains Book Title (as key) and Subject will be (1) a table in which the attribute Subject
will be a logical consequence of the key and (2) a table that provides the necessary connection
between Title and Subject.
Thus from the DKNF point of view, we are led to the same tables as previously:
Table 11.2.2
Author
Last Name
Author
First
Name
Book Title
Berdahl Robert The Politics of the Prussian Nobility
Yudof Mark Child Abuse and Neglect
Harmon Glynn Human Memory and Knowledge
Graves Robert The Golden Fleece
Miksa Francis Charles Ammi Cutter
Hunter David Music Publishing and Collecting
Graves Robert English and Scottish Ballads
Table 11.2.3
Book Title Subject
The Politics of the Prussian Nobility History
Child Abuse and Neglect Legal Procedures
Human Memory and Knowledge Cognitive Psychology
The Golden Fleece Greek Literature
Charles Ammi Cutter Library Biography
Music Publishing and Collecting Music Literature
English and Scottish Ballads Folksong
Table 11.2.4
Subject Collection or Library
History PCL General Stacks
Legal Procedures Law Library
Cognitive Psychology PCL General Stacks
Greek Literature Classics Library
Library Biography Library and Information Science Collection
Music Literature Fine Arts Library
Folksong PCL General Stacks
Table 11.2.5
Collection or Library Building
PCL General Stacks Perry-Casta�eda Library
Law Library Townes Hall
Classics Library Waggener Hall
Library and Information Science Collection Perry-Casta�eda Library
Fine Arts Library Fine Arts Building
These are the tables presented in Section 8 as single-theme tables that solved the transitive-
dependency problem of Table 8.1. Here we have arrived at these same tables by considering how
the information in Table 11.2.1 (the same information as in Table 8.1) should be re-arranged
from the DKNF point of view.
Section 11.3. Converting into DKNF a Table in Which Not Every Determinant Is a
Candidate Key
Here is the table, Table 9.1, that we used earlier to illustrate the problem of a table in which not
every determinant is a candidate key. Since we going to use it here, we name this copy of it
Table 11.3.1.
Table 11.3.1
You will recall from the
discussion of this table as Table
9.1 that one determinant is the
pair of attributes, SSN and
Major, which determines
Adviser; another determinant is
the pair, SSN and Adviser,
which determines Major; and
still another is Adviser alone,
which also determines Major.
And you will recall that the
candidate keys are the pairs,
SSN-Major and SSN-Adviser.
The third determinant, Adviser,
is not a candidate key.
From the DKNF point of view,
we reason as follows: If we
choose SSN-Adviser as the key,
then Major is determined by, and
hence is a logical consequence
of, this key, If, instead, we
choose SSN-Major as the key, then Adviser is determined by, and hence is a logical consequence
of, this alternative key. But in either case, the third constraint, viz., that Adviser determines
Major, is not a logical consequence of the key. Hence, the table is not in DKNF.
In order to move from this table to a set of tables in DKNF, we can argue. from the DKNF point
of view, that we need to move Major into a table in which it will be a logical consequence of the
key. Such a table would obviously need to have Adviser as the key. If we put Adviser and Major
SSN Major Adviser
123-45-6789 Library and Information Science Dewey
123-45-6789 Public Affairs Roosevelt
222-33-4444 Library and Information Science Putnam
555-12-1212 Library and Information Science Dewey
987-65-4321 Pre-Medicine Semmelweis
987-65-4321 Biochemistry Pasteur
123-54-3210 Pre-Law Hammurabi
into such a table, then we will need at least one other table, viz., a table that provides the
necessary link between SSN and Adviser, so that we will know who each student's adviser is.
Once we have put SSN and Adviser into such a table, there is nothing further that needs to be
done.
Here are the tables.
Table 11.3.2
Major Adviser
Library and Information Science Dewey
Public Affairs Roosevelt
Library and Information Science Putnam
Pre-Medicine Semmelweis
Biochemistry Pasteur
Pre-Law Hammurabi
History Herodotus
Table 11.3.3
SSN Adviser
123-45-
6789
Dewey
123-45-
6789
Roosevelt
222-33- Putnam
4444
555-12-
1212
Dewey
987-65-
4321
Semmelweis
987-65-
4321
Pasteur
123-54-
3210
Hammurabi
These are the tables presented in Section 9 as single-theme tables that solved the failure of Table
9.1 to be in Boyce-Codd Normal Form. Here we have arrived at these same tables by considering
how the information in Table 11.3.1 (the same information as in Table 9.1) should be re-arranged
from the DKNF point of view.
Section 11.4. Converting a Table with Multivalued Dependencies into DKNF
Here is the table, Table 10.1, that we used previously to illustrate the problem of multivalued
dependencies. Since we going to use it here, we name this copy of it Table 11.4.1.
Table 11.4.1
LastName Major Hobby
Jones Library and Information Science Surfing the Internet
Jones Library and Information Science Chess
Jones Public Affairs Surfing the Internet
Jones Public Affairs Chess
If we analyze Table 11.4.1
from the DKNF point of
view, the first thing we see
is that the key in the table
is composite. It is the triple,
LastName-Major-Hobby.
But in an intuitive sense,
the natural key would be
just LastName, since we
know that there are just
four students involved and
that we are trying to
present data about their
majors and their hobbies.
The complications arise
because some of the
students have more than
one major and/or more than
one hobby. Another way of putting it is that the complications of the table arise from the fact that
we are trying to display, in just one table, more information than it is practicable to display in a
single table.
From the DKNF point of view, we have two constraints. One constraint concerns the natural key,
LastName, and the attribute, Major. If we set up one table that houses these attributes, then the
constraint on Major will be a logical consequence of the key, LastName. The other constraint
concerns the natural key, LastName, and the attribute, Hobby. If we set up a second table that
houses these attributes, then the constraint on Hobby will be a logical consequence of the key,
LastName. Having set up these two tables, we will find that there is nothing further to be done.
Here are the tables.
Table 11.4.2
LastName Major
Jones Library and Information Science
Jones Public Affairs
Lee Library and Information Science
Lee Library and Information Science Photography
Lee Library and Information Science Stamp collecting
Ruiz Pre-Medicine Surfing the Internet
Ruiz Pre-Medicine Photography
Ruiz Biochemistry Surfing the Internet
Ruiz Biochemistry Photography
Smith Pre-Law Playing poker
Ruiz Pre-Medicine
Ruiz Biochemistry
Smith Pre-Law
Table 11.4.3
LastName Hobby
Jones Surfing the Internet
Jones Chess
Lee Photography
Lee Stamp collecting
Ruiz Surfing the Internet
Ruiz Photography
Smith Playing poker
These are the tables presented in Section 10 as single-theme tables that solved the failure of
Table 10.1 to be in 4NF. Here we have arrived at these same tables by considering how the
information in Table 11.4.1 (the same information as in Table 10.1) should be re-arranged from
the DKNF point of view.
Section 11.5. Single-Theme Tables and the DKNF
What has the preceding discussion shown us?
We have seen that when we analyze, from the DKNF point of view, tables with various kinds of
problems, we find--again and again--that the solutions to the problems consist in turning a
complicated, multi-theme table into sets of single-theme tables, tables which satisfy the
requirements of the DKNF. If on the other hand, we analyze a complicated, problem-laden table
from the point of view of turning it into a set of single-theme tables, we thereby achieve--again
and again--a set of tables that satisfy the requirements of the DKNF.
In short, sets of single-theme tables will almost always be sets of tables in DKNF and, as such,
will be sets of tables that avoid the various kinds of anomalies that we want to avoid.