Eliminating Process of Normalization in Relational Database Design

  • Upload
    bat717

  • View
    224

  • Download
    0

Embed Size (px)

Citation preview

  • 7/29/2019 Eliminating Process of Normalization in Relational Database Design

    1/6

    ELIMINATING PROCESS OF NORMALIZATION IN RELATION ALDATABASE DESIGNTauqeer Hu ssain*, Shafay Shamail, Mian M. Awais

    (tauqeer, sshamail, awais)@lums.edu.pkDepartment of Computer ScienceLahore Universily of Management Sciences (LVM S), Laho re. PakistanAbstract- The relational database designapproach requires the process of normalization inorder to minimize data redundancy and updateanomalies in the relational schema. Algorithmsdefined in normaliza tion theory depend uponvarious dependencies namely functional, multi-valued, join and inclusion dependencies thatshould be carefilly defined for a databaseapplication. Identification of these dependenciesan d a minimal cover is a complex and timeconsuming task for almost all practical problems.This paper discusses how the normalizationprocess ca n be eliminated from the required stepsof database design. It explores various constructsof Entity Relationship Diagram (Em)nd theirtransformation to relational schema. This paperelaborates how un-normalized relations arecreated during the Entity Relationship (ER)model to relational schema transformation.A se tof rules is presented which if followed at thestage of conceptual modeling would alwaysgenerate a relationa l schema that satisfies normalforms up to Boyce-Codd'Normal Form (BCNF),thus eliminating the need of normalization. Themotivation behind this paper is to save databasedesigner's valuable time and effort otherwiserequired in defining dependencies, in finding aminimal cover and in normalizing a givenrelational schema.Keywords: ER model, conceptuol modeling,wlational datobase schema, normalization.functional dependency

    Correspond ing author1. INTRODUCTIONAt a logical level, the objective of designing agood relational database is to define a set ofrelation schemas which does not have dataredundancy and data anomalies like insertanomalies, update anomalies and deleteanomalies. Still to date, this objective is achievedby the normalization process which may startfrom first normal form (INF) and, depending

    upon the extent of purification desired in theultimate database, may go up to domain keynormal form (DKNF) also known as 6NF. Thiswhole process requires defining a set offunctional, multi-valued and inclusiondependencies describing important constraints ofthe real world problem .There are two maih techniques for relationaldatabase design - top-down design and bonom-up design [6]. In bosom-up design, a universalrelation which consists of all the databaseattributes, is defined land then b ased upon givenset of ependencies the normalization algorithmsas defined in literaturt [Z, , 13,61 are repeatedlyapplied. These algorithms decompose theuniversal relation into a set of relations whichsatisfy a given normal form. On the other hand,the top-down design i s defined in three steps: I)Conceptual modeling: the data requirements areconceptualized using a conceptual modelrepresenting the semantics of real world, 2)Mapping: the conceptual model is transformedinto a set of candidate relations. and 3)Normalization: the candidate relations are funherrefined to remove data (121. Comparing thesetwo approaches, the bottom-up design approachhas been criticized due to the problems of auniversal relation [9, 101; whereas, topdowndesign is used most extensively in commercialdatabase design (61. Ln either case, normalizationprocess requires a set of dependencies to bedefined for every problem. This set ofdependencies may have redundancy, that isredundant dependeri'cies, in itself which iseliminated by finding a minimal cover se t ofthese dependencies. Nevertheless, finding aminimal cover is a complex task for almost allpractical problems. Further, in pursuit ofdefining a set of dependencies that encompassesall real world constraints of the problem, adatabase designer may include dependencies thatare valid but irrelevant to the problem fromnormalization point of view [14]. Presence ofthese dependencies makes the task ofnormalization more difficult and identification of

    408 Proceedings IEEE INMIC 2003

  • 7/29/2019 Eliminating Process of Normalization in Relational Database Design

    2/6

    these valid but irrelevant dependencies is noteasy either.This gives us the motivation to review the three-step database design methodology as given intop-down approach for eliminating thenormalization step, if possible. This paperexplores how various constructs of an ER D ar etransformed into relations and how violations ofnormal forms might occur. I t then suggests howan ERD should he improved depicting a betterconceptual model which when transformed io arelational schema using mapping algorithmsresults in normalized relations. This paper showsthat getting un-normalized relations from an ERmodel is an indication that the ER model itselfneeded improvement. This improvement can bemade in the light of a given set of functionaldependencies (FDs) as these dependenciesrepresent the actual real world constraints andthus cannot be ignored. In order to improve thesemantics of the ER model these dependenciesshould be defined and represented during thefirst step of conceptual design phase.

    ~ 2. ER-TO-RELATIONAL SC HEM AMAPPlNGThe entity relationship approach was initiallyproposed by Chen [ I ] and powerful extensionswere gradually suggested in the literature [ I I , 4,5, 12, 8, 71. The ER-to-Relational Schemamapping rules are available in the literature [ I ,61. These rules are summarized in the algorithmbelow and the nomenclature used is given inTable I .ALGORITHM:Let5 = (SE, , SE2,... SE,, W E ,, WE2. ...WE.,R i 3 R ~ . . . . . R P )Now the mapping algorithm can be defin ed as:1. For every E, E (SE , , SEI, ... SE,,,), create arelation S, such that:Attr(S,) = SimpAttr(E,), andPK(S,)= KeyAttr(E,)

    create a relation Si such that:2. For every E, E {WE, . WE2, __. E,),Attr(Si) = SimpAttr(E,) U PK(T)where T c E, and SE, 0 jPK(S0 = PanialKeyAttr(EJ U PK(T)3. F o r e v e r y R , ~R I . R2 , ...,RnJ,

    S(R:E,&)

    a) if

  • 7/29/2019 Eliminating Process of Normalization in Relational Database Design

    3/6

    solution is proposed which is applied during theconceptual modeling phase in order to generatenormalized relation schem as.

    3.1 Violation of INFINF disallows relations having a se t ofval ues ora Nple as an attribute value for a single tuple.With the introduction of concept of compositeattribute [SI, violation of INF is no longerpossible because the mapping rules alwaysmnslate a composite attribute into its simplerattributes. These simpler attributes, by definition,always have atomic values.3.2 Violation of ZNFConsider a relation schema:R(& A,. _._.b Awl. ...,A.)with its primary key:

    K = (A,, A2. ..., At} where k

  • 7/29/2019 Eliminating Process of Normalization in Relational Database Design

    4/6

    i

    PROJECT-UOCATIONFigure 2: Modified ER D eliminating violationof 2NF

    a set of non-prime attributes:NK = (Ax+, , ..,A.).3NF is violated only when there exists atransitive dependency due to the presence of atleast one FD such that:X- where X,Y c NK .Say, R is generated by mapping an entity type E.As y e show in the following example, in everysuch case,the real world constraint represented bythis FD is not represented in the E R modelwhich actually causes violation, andii) corresponding to such FD there shouldexist a logical entity type E' which is notdefined in the original ER model.

    i)

    According to the given FD , all possible values ofset of attribute(s) X define unique values of setof attribute@) Y an d so X becomes a keyattribute of E'. Since E' will constru ct a logicalrelationship with E, sets of attributes X and Yshould be moved on to E'. With this improvedfigure, we have better conceptual design an d atthe same time its mapping generates relationschemas that satisfy 3NF . It is to be noted that, inthis case, X is not a set o f prime attributes of theentity type E and that is why E still remains astrong entity type, as opposed to the casediscussed in section 3.2. Th is can be summarizedin rule 2 as given below.RULE :For every FD X - :where X.Y c Attr(E)-KeyAm(E)

    i) create an entity type E' such that:Am (') = X U Y, and

    KeyAttr(E') = Xii) create a relationship 'y pe R between Eiii) Attr(E) = Attr(E) - [X Y]and E

    To illustrate this concept, consider a relationschema EMF' (Q Name, Dept#, DeptName)generated from Figure 3.

    Figure 3: ERD violating 3NFNow, the following FDs are no te dFDI: ID - ame, Dept#,DeptNameFD2: Dept# -+ DeptNameThe relation EMP is not in 3N F due to FD2. ThisFD suggests that Figure 3 should have a betterrepresentation as given in Figure 4, usingsolution proposed above. This figure generatesthe following relations:EM P (UName)DEPT(U eptName)where each relation now satisfies 3NF.

    Figure 4: Modified ERD eliminating violationof 3N F3.4 Violation ofBCNFWe present a theorem here:Theorem: A relation schema R(&, Az, ..., A.)with primary key K = { A , ) and a set of non-prime attributes NK = (Az, ..., An} cannotviolate BCNF if R already satisfies INF, 2NFand 3NF.

    ProceedingsIEEE INMlC 2003 411

  • 7/29/2019 Eliminating Process of Normalization in Relational Database Design

    5/6

    Proof: We prove this theorem by negation. Le tthere be a relation schema R which satisfies 3NFand there exist a FD X - Y due to which Rviolates BCNF. For the le ft hand side of this FD(called determinant) there are following fourpossibilities that ca n exist:

    i) X = Ki i) X c Kiii) X 3 Kiv) X K and K g X

    These cases are discussed below.Case i ) :Since determinant i s a complete key, BCNF

    i s satisfied negating our hypothesis.Case ii):If eterminant i s only a part of the primarykey, then ZNF i s violated which negates our

    hypothesis.Case iii):This case implies that the FD i s a trivialfunctional dependency always reducible toK - and the rest ofthe proof s the same

    as given in case i).Case iv):This case i s discussed under twopossibilities, when:

    a) X n K # 0 o rb) X n K = 0

    For case (iva), le t X n K = Z and c E Z. Now cE K. Since K is a singleton set, K c whichnegates the condition in case (iv). Whereas Case(ivb) implies that X i s a non-prime attribute. Ycan now have the same four possibili ties:

    I . Y = K2. Y c K3. YIK4. Y g K a n d K g Ywhich are discussedone by one.

    Case ivbl):X - Y and Y = K imp;) X - K whichimplies that X i s a candidate key. HenceBCNF i s satisfied and hypothesis i s false.Case (ivb2) is impossible because K i s a

    singleton set and cannot have a propersubset.

    Case ivb3):X - Y and Y 3 K imply X - whichimplies that X is a candidate key. HenceBCNF i s satisfied and hypothesis i s false.Case ivb4):As discussed in case (iv). tw o possibilitiesmay exist: Y n K t 0 or Y n K = 0.The

    frst possibili ty violates K a Y whereas th esecond possibility X nK = 0 and Y nK =0 imply X.Y c NK. Then, X +Y violates3NF which makes hypothesis false.

    Hence, violation of BCNF i s not possible underthe given conditions.4. CONCLUSIONIn this paper we have discussed problems thatcause violation o f normal forms given an ERDand a set of functional dependencies from therea l world. I t has been elaborated that suchviolations occur due to improper or insufficientrepresentation of real world constraints in thecorrespondingER model. This paper has shownthat i) violation of INF i s not possible ifcomposite attributes (i f any) are represented, ii )violation of 2NF and 3NF occurs when somefunctional dependencies are ignored _orimproperly represented in the ERD. A set ofrules has been defined which states how thesedependencies should be represented by addingweak entity types and regular entity types. Wehave also shown that applying these ru les alwaysgenerates relation schemas that do not violate therespective normal forms. Finally, i t has beenproved that, for BCNF, violat ion cannot occurprovided that a relation satisfies up to 3NF an8it s primary key consistsof only one attribute. Weintend to discuss in our future research the caseof BCNF with primary key consisting ofmultiple attributes. This paper has concluded thatnormalization process is not required at least til lBCNF if eal world problem and i ts constraintsar e properly represented in he ER model.REFERENCESI.

    2.

    Chen. P.P. 1976) The entity relationshipmodel: towards a unified view of data. ACMTrunsucrionsonDazubuseDeJign. I I pp.9-36Codd. E.F. (1972) Further normalization ofthe data base relational model. In Rustin

    412 Proceedings IEEE INMIC 2003

  • 7/29/2019 Eliminating Process of Normalization in Relational Database Design

    6/6

    3. Codd. E.F. (1974) Relational investigationsin relational database systems. Proceedingso the IFIP CongressElmasri, R., and Wiederhold, G . (1980)Structural properties o f relationships andtheir representation. NCC. FIPS. 49Elmasri. R., Weeldreyer, J.. and Henver, A.( 1985) The category concept: An extensionto the entity-relationship model.Internarionul Journa l on Datu andKnowledge Engineering. I I )Elmasri. R., and Navathe, S. (2000)Fundamenruls ofd arabase sysrems, 3 Ed.Addison-WesleyEngels, G., Gopolla, M. , Hohenstein. U.,Hiilsmann. K., Lohr-Richter, P., Saake, G.,Ehrich D. (1992) Ccnceptual modelling ofdatabase applications using an extendcd ERmodel. Dora and Knowled ge Engineering.9(2), pp. 157-204Gogolla, M., and Hohenstein. U. (1991)Towards a semantic view of an extendedentity-relatio nship model. ACM

    4.

    5.

    6 .

    7.

    8.

    Transactions on Darabuse Systems. I6(3).pp. 36941 6Kent, W. (1981) Consequences of assuminga universal relation. AC M Transaclions onDambase Svstems, 6(4), pp . 539-55610. Kent, W. (1983) The universal relationrevisited. ACM Transuctions on Dolabase

    Syslems, 8(4), pp . 644-648I I . Smith, J., and Smith, D. (1977) Databaseabstractions: Aggregation andgeneralization.ACM Transactions onDstaba se sysrems, 2(2)12. Teorey. T. . Yang. D.. and Fry, J. (1986) Alogical design methodology for relationaldatabases using the extended entityrelationship model. ACM ComputingSurveys.18(2),pp. 197-222

    13 . Ullman, J. D. (1988) Prinriplesofdarabuseand knowledge-base sysrems. Vol. I ,Computer Science Press14. Ullman, J . D. (1990) Principles ofdatabuse

    sysrems. Zd Ed., Computer Science Press

    9.

    Proceedings IEEE INMlC 2003 413