Database Normalization

  • Published on

  • View

  • Download

Embed Size (px)


Database Normalization


<p>NormalizationConstraints: There are 2 types of constraints: 1) Defines the permitted values, the attributes can have. 2) Defines the relationship between the attribute.</p> <p>Functional Dependency (FD): It is denoted by. FD X Y means that X uniquely determines Y, where X and Y are simple or composite attributes. The dependency from X to Y is said to be there if application has the following: If T1 and T2 are 2 tuples with some values X then value for Y must also be same in T1 and T2 i.e., relationship between X and Y is independent of other attributes which must be present in the table. In simple words, for a given X there is always a single value of Y . For example, A Street of a City Pin Code: A Street of a city has a unique</p> <p>pin code however the reverse need not be true. (ISBN, TITLE) AUTHOR: Given an ISBN number, one can find</p> <p>the Title and name of the Author of the book. Implications and Covers: The application can call for some functional dependencies which may imply additional functional dependencies. If F is the set of FDs then we define closure of F denoted as F+ to be set of all possible FDs, which is implied by F. To find F+, given F, we have to find out the interface rules for the FDs which are implied by F. The inference rules are very important for good database design for the following reasons: Given F, one may like to determine whether XY is implied or not.</p> <p> For computing the closure of F+ of F.1 | Page Written By Arnav Mukhopadhyay EMAIL:</p> <p> Given F we may want to remove those FDs, which are redundant in F. A FD is redundant if it is implied by another FD in F. While designing database schema: Find minimal cover of G of F. By finding minimal cover, G does not contain any redundant FDs (that is, G+ will be same as F+). By computing minimal cover G of F, we can ensure that DBMS will enforce the constraints, which automatically enforces the constraints implied by G. Inference rules for FDs: Inference rules are known as Armstrongs Axioms, are published by Armstrong. These properties are as given below:1. Reflexive property: X Y is TRUE, if Y is a SUBSET of X. 2. Augmentation property: If X Y is TRUE, then XZ YZ is also</p> <p>TRUE.3. Transitivity property: If X Y and Y Z are TRUE, then X Z is</p> <p>implied.4. Union property: If X Y and X Z are TRUE, then X YZ is also</p> <p>TRUE. This property indicates that if Right Hand Side of FD contains many attributes then, FD exists for each of them.5. Decomposition property: If X Y is implied and Z is a SUBSET of Y,</p> <p>the X Z is implied. This property is the Reverse of Union property.6. Pseudo transitivity property: If X Y and WY Z are given, then</p> <p>XW Z is TRUE. Example: Consider a college having a table STUDY with COURSE, TEACHER, ROOM NO and DEPARTMENT as attributes. STUDY (COURSE, TEACHER, ROOMNO, DEPT) here identify few FDs namely: Course Teacher Teacher Department Course Room Number 2 | Page Written By Arnav Mukhopadhyay EMAIL:</p> <p>Additional FDs can be derived from above by using Inference properties: By reflexivity: (Course, Teacher) Teacher By Augmentation: (Course, Room Number) (Teacher, Room</p> <p>Number) By Transitivity: Course Department By Union: Course (Teacher, Room Number)</p> <p>The main axioms of Armstrong are sound and complete, and are defined as:1. Soundness property: If X Y can be inferred from F using the</p> <p>above axioms, the X Y will be TRUE in any relation in which F holds.2. Completeness property: If X Y cannot be inferred from F and F</p> <p>holds in relation R, then X Y will not be TRUE in relation R.</p> <p>3 | Page Written By Arnav Mukhopadhyay EMAIL:</p> <p>Normalization: Consider the table shown below: Order No. Order Date Item Code 1456 260289 3687 4627 3214 1886 040389 4629 4627 1788 040489 4627 Item Lines Quantity 52 38 20 45 30 40 Price / Unit 50.40 60.20 17.50 20.25 60.20 60.20</p> <p>Table 1: An UnNormalized Relation</p> <p>In this relation, an order no. includes many items. The attribute order lines is not single attribute but is composed of many attributes. Besides this, the number of ITEM LINES is variable. This form is not suitable for storage as a file in a computer. Further, retrieval of data based on a component of a composite attribute is difficult. For example, to find out how many items with a specified item code are ordered, one must break up composite attribute first before attempting a search. Thus a relation with a format such as in above table is not allowed. It is said to be unnormalized. To normalize this relation, a composite attribute is converted to individual attributes. The normalization step consists of first identifying fields within a composite attribute as individual attributes. After doing this common attributes for a composite attribute are duplicated as many times as there are lines in the composite attribute. The normalized relation corresponding to the relation given in the table 1 above is show in the table 2 below:</p> <p>4 | Page Written By Arnav Mukhopadhyay EMAIL:</p> <p>Table 2: Normalized form of the Relation given in Table 1</p> <p>Order No. 1456 1456 1456 1886 1886 1788</p> <p>Order Date 260289 260289 260289 040389 040389 040489</p> <p>Item Code 3687 4627 3214 4629 4627 4627</p> <p>Quantity 52 38 20 45 30 40</p> <p>Price / Unit 50.40 60.20 17.50 20.25 60.20 60.20</p> <p>The relation shown in the Table 2 is said to be in First Normal Form, abbreviated as 1NF. This form is also called a flat file. There are no composite attribute and every single attribute is single and describes only one property. Converting a relation to 1NF form is the first essential step in normalization. There are successive higher normalization forms known as 2NF, 3NF, BCNF, 4NF and 5NF. Each form is an improvement over the earlier form. In other words, 2NF is an improvement over 1NF; 3NF is an improvement over 2NF, and so on. A higher normal form relation is a SUBSET of lower normal form, as shown in Figure 1.</p> <p>5 | Page Written By Arnav Mukhopadhyay EMAIL:</p> <p>Figure 1: Illustration of successive normal form of a relation.</p> <p>5NF4NF BCNF 3NF 2NF 1NF The higher order normalization steps are based on 3 important concepts:</p> <p>1) Dependence among attributes in a relation. 2) Identification of an attribute or a set of attributes as the key of a relation. 3) Multivalued dependency between attributes. Functional Dependency: Remember that, there is no fool-proof algorithmic method to identify functional dependency. Let X and Y be 2 attributes of a relation. Given a relation X, if there exists one value of Y corresponding to it, then Y is said to be functionally dependent on X. This is indicated by the notation: X Y</p> <p>6 | Page Written By Arnav Mukhopadhyay EMAIL:</p> <p>For example, given the value of item code, there is only one value of item name for it. Thus item name is functionality dependent on item code. This is as shown as: Item Code Item Name Similarly in Table 2, given an Order Number, the date of the order is known. Thus: Order No. Order Date</p> <p>Functional dependency may be based on a composite attribute. For example, if we write: X, Z Y It means that there is only one values of X, Z. In other words, the composite X, Z. In Table 2, Code together with Quantity and value of Y corresponding to given Y is functionality dependent on for example, Order No. and Item Price. Thus,</p> <p>Order No., Item Code, Quantity Price Example: Consider the relation: Student (Roll No., Name, Address, Dept., Year of Study) In this relation, Name is functionally dependent on Roll No. In fact, given the value of Roll No., the values of the other entire attribute can be uniquely determined. Name and Department are not functionally dependent because given the name of a student; one cannot find his department uniquely. This is due to the fact that there may be more than one student with the same name. Name in this case is not a key. Department and Year of Study are not functionally dependent as Year of Study pertains to a student whereas Department is an independent attribute. The functional dependency in this relation is shown in Figure 2, as a dependency diagram. Such dependency diagrams are very useful in normalization.</p> <p>7 | Page Written By Arnav Mukhopadhyay EMAIL:</p> <p>Figure 2: Dependency diagram for the relation Student</p> <p>Name</p> <p>Roll No.</p> <p>Address</p> <p>Department</p> <p>Year of Study</p> <p>Relation Key: Given a relation, if the value of an attribute X uniquely determines the values of all other attributes in a row, then X is said to be the key of the relation. Sometimes more than one attribute is used to uniquely determine other attributes in a relation row. In that case, such a set of attributes is the key. In Table 2, Order No., and Item Code together determines Order Date, Quantity and Price. Thus the key is formed by the combination set of (Order No., Item Code). In this relation, Supplies (Vendor Code, Item Code, Quantity supplied, Date of Supply, Price/Unit), Vendor Code and Item Code together form the key. This dependency is shown in the dependency diagram of Figure 3:</p> <p>8 | Page Written By Arnav Mukhopadhyay EMAIL:</p> <p>Figure 3: Dependency diagram for the relation Supplies.</p> <p>Quantity Supplied Vendor Code Date of Supply</p> <p>Item Code Price/Unit</p> <p>Observe that in the figure, the fact that Vendor Code and Item Code together form the composite key is clearly shown by enclosing them together in a rectangle.</p> <p>9 | Page Written By Arnav Mukhopadhyay EMAIL:</p> <p>Why do we Normalize a Relation? Relations are normalized when relations in a database are to be altered during the lifetime of a database, we do not loose information or introduce inconsistencies. The type of alterations normally needed for relations are:1) Insertions of new data values to a relation. This should be</p> <p>possible without being forced to leave blank fields for some attributes.2) Deletions of a tuple, namely, a row of a relation. This</p> <p>should be possible without losing vital information unknowingly.3) Updating or changing a value of an attribute in a tuple.</p> <p>This should be possible without exhaustively searching all the tuples in the relation. Ideal relations after normalization should have the following properties so that the problems mentioned above do not occur for relations in the (ideal) normalized form: 1) No data value should be duplicated in different rows unnecessarily. 2) A value must be specified (and required) for every attributes in a row. 3) Each relation should be self-contained. In other words, if a row from a relation is deleted, important information should not be accidentally lost. 4) When a row is added to a relation, other relations in the database should not be affected. 5) A value of an attribute in a tuple may be changed independent of other tuples in the relation and other relations. The idea of normalizing relations to higher and higher normal forms is to attain the goals of having a set of ideal relations meeting the above criteria.</p> <p>10 | P a g e Written By Arnav Mukhopadhyay EMAIL:</p> <p>Second Normal Form (2NF) Relation A relation is said to be in 2NF if:1) It is already in 1NF form.</p> <p>2) Non-key attributes are functionally dependent on Key attributes. 3) If the key has more than 1 attributes then no non-key attributes should be functionally dependent upon part of key attributes. Example: Consider Table 2, as it is in 1NF form. Key = (Order No., Item Code) The dependency diagram for attributes of this relation is shown in Figure 3. The non-key attribute Price/Unit is functionally dependent on Item Code, which is a part of relation key. Also, non-key attribute Order Data is functionally dependent on Order No., which is a part of relation key. As the non-key attributes depends only upon part of the key attributes, the relation in Table 2, is not in 2NF form. To obtain the 2NF form of Table 2, we proceed as follows: The relation Orders has Order No. as key. The relation Order Details has the composite key Order No. and Item Code. In both relations, the non-key attributes are functionally dependent on the whole key. Observe that by transforming to 2NF relations the repetition of Order Date in Table 2 has been removed. Further, if an order for an item is cancelled, the price of an item is not lost.</p> <p>11 | P a g e Written By Arnav Mukhopadhyay EMAIL:</p> <p>OrdersOrder No. Order Date1456 2602891886040389 1788040489</p> <p>Table 3: Splitting of Relation given in Table 2 into 2NF Relations.</p> <p>Order DetailsOrder No. Item CodeQuantity1456 3687521456462738 145632142018864629 4518864627301788 462740</p> <p>PricesItem CodePrice / Unit368750.40462760.20 321417.50462920.25</p> <p>Figure 4: Dependency diagram for the relation given in Table 2.</p> <p>Order Date Order No. Quantity</p> <p>Item Code Price/Unit</p> <p>If Order No. 1886 for Item Code 4629 is cancelled in Table 2, then the fourth row will be removed and the price of the item will be lost. 12 | P a g e Written By Arnav Mukhopadhyay EMAIL:</p> <p>In Table 3, only the fourth row of the The Item price is present in the right be touched and hence the price of Item Order is also not lost as it is in the</p> <p>centre table will table in Table 3, 4629 is not lost. left table of the</p> <p>be removed. which will not The Date of Table 3.</p> <p>These relations in 2NF form meet all the ideal conditions specified. Observe that the 3 relations obtained are self-contained. There is no duplication of data within a relation.</p> <p>13 | P a g e Written By Arnav Mukhopadhyay EMAIL:</p> <p>Third Normal Form (3NF) 3NF form normalization will be needed where all attributes in a relation tuple are not functionally dependent on the key attribute. If 2 non-key attributes are functionally dependent, then there will be unnecessary duplication of data. Consider the relation given in Table 4.Table 4: A 2NF Form Relation</p> <p>Roll No. 1784 1648 1768 1848 1682 1485</p> <p>Name Raman Krishnan Gopalan Raja Maya Singh</p> <p>Department Physics Chemistry Mathematics Botany Geology Zoology 1 1 2 2 3 4</p> <p>Year</p> <p>Hostel Name Ganga Ganga Kaveri Kaveri Krishna Godavari</p> <p>Here, Roll No. is the Key and all other attributes are functionally dependent on it. Thus is in 2NF. If it is known that in the college all 1st Year students are accommodated in Ganga Hostel, all 2nd Year students in Krishna, and all 4th Year students in Godavari, then the non-key attribute Hostel Name is dependent on the non-key attribute Year. This dependency is shown in the Figure 5. Observe that given the Year of student, his Hostel is known and vice versa.</p> <p>14 | P a g e Written By Arnav Mukhopadhyay EMAIL:</p> <p>Figure 5: Dependency diagram for the relation given in Table 4.</p> <p>Name</p> <p>Departme...</p>