Week 7-8-normalization

  • View

  • Download

Embed Size (px)




  • 1. 11NormalizationNormalizationNormalization is the process of efficiently organizing data in a database with two goals in mindFirst goal: eliminate redundant datafor example, storing the same data in more than one tableSecond Goal: ensure data dependenciesmake sensefor example, only storing related data in a tableBenefits of NormalizationLess storage spaceQuicker updatesLess data inconsistencyClearer data relationshipsEasier to add dataFlexible StructureBad database designs results in: redundancy: inefficient storage.anomalies: data inconsistency, difficulties in maintenance4ExampleName Price Category Manufacturergizmo $19.99 gadgets GizmoWorksPower gizmo $29.99 gadgets GizmoWorksSingleTouch $149.99 photography CanonMultiTouch $203.99 household HitachiRelational schema:Product(Name, Price, Category, Manufacturer) Instance:5First Normal Form (1NF)A database schema is in First Normal Form if all tables are flatNameGPACoursesAlice3.8Bob3.7Carol3.9MathDBOSDBOSMathOSStudentNameGPAAlice3.8Bob3.7Carol3.9StudentCourseMathDBOSStudentCourseAliceMathCarolMathAliceDBBobDBAliceOSCarolOSTakesCourseMay needto add keys6Functional DependenciesA form of constrainthence, part of the schemaFinding them is part of the database designAlso used in normalizing the relationsWarning: this is the most abstract, and hardest part of the database design.

2. 27Functional DependenciesDefinition:If two tuples agree on the attributesthen they must also agree on the attributesFormally:A1, A2, , An B1, B2, , BmA1, A2, , AnB1, B2, , BmFunctional dependency between A and B8Examples EmpID Name, Phone, Position Position Phone but Phone PositionEmpID Name Phone PositionE0045 Smith 1234 ClerkE1847 John 9876 SalesrepE1111 Smith 9876 SalesrepE9999 Mary 1234 Lawyer9In General To check A B, erase all other columns check if the remaining relation is many-one(called functional in mathematics) A BX1 Y1X2 Y2 10ExampleEmpID Name Phone PositionE0045 Smith 1234 ClerkE1847 John 9876 SalesrepE1111 Smith 9876 SalesrepE9999 Mary 1234 LawyerPosition Phone11Typical Examples of FDsProduct: name price, manufacturerPerson: ssn name, ageCompany: name stockprice, president12ExampleProduct(name, category, color, department, price)name colorcategory departmentcolor, category priceConsider these FDs:What do they say ? 3. 313ExampleFDs are constraints on relations:On some instances they holdOn others they dontnamecategorycolordepartmentpriceGizmoGadgetGreenToys49TweakerGadgetGreenToys99Does this instance satisfy all the FDs ? namecolorcategorydepartmentcolor, categoryprice14ExamplenamecategorycolordepartmentpriceGizmoGadgetGreenToys49TweakerGadgetBlackToys99GizmoStationaryGreenOffice-supp.59What about this one ? namecolorcategorydepartmentcolor, categoryprice15ExampleIf some FDs are satisfied, thenothers are satisfied tooIf all these FDs are true:namecolorcategorydepartmentcolor, categorypriceThen this FD also holds:name, categorypriceWhy ??16Inference Rules for FDsIs equivalent toSplitting ruleand Combining ruleA1...AmB1...BmA1, A2, , An B1, B2, , BmA1, A2, , An B1A1, A2, , An B2. . . . . A1, A2, , An Bm17Inference Rules for FDs(continued)Trivial RuleWhy ?A1Amwhere i = 1, 2, ..., nA1, A2, , An Ai18Inference Rules for FDs(continued) Transitive Closure RuleIfandthenWhy ? A1, A2, , An B1, B2, , BmB1, B2, , Bm C1, C2, , CpA1, A2, , An C1, C2, , Cp 4. 419A1AmB1BmC1...CpFunctional DependenciesWe use functional dependencies to: test relations to see if they are legal under a given set of functional dependencies. If a relation ris legal under a set Fof functional dependencies, we say that rsatisfies F. specify constraints on the set of legal relationsWe say that Fholds on Rif all legal relations on Rsatisfy the set of functional dependencies F.2021Kis a superkey for relation schema Rif and only if K RKis a candidate key for Rif and only ifK R, andfor no K, RFunctional dependencies allow us to express constraints that cannot be expressed using superkeys. Consider the schema:bor_loan = (customer_id, loan_number, amount )We expect this functional dependency to hold:loan_numberamountbut would not expect the following to hold:amount customer_nameFunctional Dependencies22A functional dependencyis trivialifExample:customer_name, loan_number customer_namecustomer_name customer_nameFunctional Dependencies23Consider the relation:PLOTS (prop#, state, plot#, area, price, Tax_rate)Information about plots available in India. The constraints on the relation are:Prop# is unique throughout IndiaPlot# are unique within a given stateFor a given_state, tax_rate is fixedPlots having the same area have the same price, irrespective of the state in which they are locatedWrite all the FDs on the relation PLOTSFunctional Dependencies24Functional DependenciesPLOTSProp#StatePlot#AreaPriceTax_rateFD1PKFD2CKFD3FD4Identify redundancy in PLOTSIdentify update anomalies in PLOTS 5. 525Functional DependenciesPLOTSFD1PKFD2CKPlot#StateProp#AreaPriceAreaFD4Tax_rateFD3State26Dependency Diagram (1NF) Figure 4.427Conversion to 1NFA relational schema R is in first normal form if the domains of all attributes of R are atomicRepeating groups must be eliminatedProper primary key developedUniquely identifies attribute values (rows)Combination of PROJ_NUM and EMP_NUMDependencies can be identifiedDesirable dependencies based on primary keyLess desirable dependenciesPartialbased on part of composite primary keyTransitiveone nonprime attribute depends on another nonprime attribute281NF SummarizedEach attribute must be atomic (single value)No repeating columns within a row (composite attributes)No multi-valued columns.All key attributes definedAll attributes dependent on primary key1NF simplifies attributesQueries become easier.29Conversion to 2NFStart with 1NF format:Write each key component on separate lineWrite original key on last lineEach component is new tableWrite dependent attributes after each keyPROJECT (PROJ_NUM,PROJ_NAME) EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR) ASSIGN (PROJ_NUM, EMP_NUM, HOURS) 30Second Normal Form (2NF)Each attribute must be functionally dependent on the primary key.If the primary key is a single attribute, then the relation is in 2NFThe test for 2NF involves testing for FDs whose left-hand-side attribute are part of the primary keyDisallow partial dependency, where non-keys attributes depend on part of a composite primary keyIn short, remove partial dependencies2NF improves data integrity.Prevents update, insert, and delete anomalies. 6. 6312NF Conversion ResultsFigure 4.532Based on the concept of Full FDs (FFD)If A & B are sets of attributes of R, B is said to be FFD on A if AB, but no proper subset of A determines BNo partial dependencies on the PKIs PLOTS in 2NF?YESSingle attribute PKAll relations with single attribute PK are in 2 NF!!2 NF applies to relations with composite keys2 NF33A relation that is in 1NF & every non-PK attribute is fully functionally dependent on the PK, is said to be in 2 NF1 NF2 NF2 NFRemove allPartial Dependencies342NF SummarizedIn 1NFIncludes no partial dependenciesNo attribute dependent on a portion of primary keyStill possible to exhibit transitive dependencyAttributes may be functionally dependent on nonkey attributes35Conversion to 3NFCreate separate table(s) to eliminate transitive functional dependencies PROJECT (PROJ_NUM,PROJ_NAME) ASSIGN (PROJ_NUM, EMP_NUM, HOURS) EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS) JOB (JOB_CLASS, CHG_HOUR) 36Based on the concept of transitive dependencyNo non-PK attribute should be transitively dependent on the PKTransitive DependencyIf AB & BC, then A transitively determines C through B, provided B & C do not determine AIs PLOTS in 3NF?NO3 NF 7. 7373 NFPLOTSProp# State Plot# Area Price Tax_rateFD1 PKFD2 CKFD3FD4Prop# transitively determines tax_rate through stateProp# transitively determines price through area38 A relation that is in 1NF & 2 NF & no non-PKattribute is transitively dependent on the PK,is said to be in 3 NF2 NF3 NF3 NFRemove allTransitive Dependencies392NF Example - 1 Inventory (Item, Supplier, Cost, Supplier Address) We first check if Cost is fully functionally dependent uponthe ENTIRE Primary-Key If I know just Item, can I find out Cost? No. We can have > 1 supplier for the same product. If I know just Supplier, and I find out Cost? No. We need to know what the Item is as well. So, Cost is fully functionally dependent upon theENTIRE Primary-Key402NF Example - 2 Inventory (Item, Supplier, Cost, Supplier Address) We then check if Supplier Address is fully functionallydependent upon the ENTIRE Primary-Key If I know just Item, can I find out Supplier Address? No. We can have > 1 supplier for the same product. If I know just Supplier, and I find out Supplier Address? Yes. The suppliers address does not depend on theItem. So, Supplier Address is NOT fully functionallydependent upon the ENTIRE Primary-Key NOT 2NFSo putting things togetherInventoryDescription Supplier Cost Supplier AddressInventoryDescription Supplier CostSupplierName Supplier AddressThe above relation is now in 2NF since the relation has no non-keyattributes.Transitive DependenceGive a relation R,Assume the following FD hold:Note : Both Ename and Address attributes are non-key attributes in R, andsinceAddress depends on a non-Prime attribute Name, which depends on theprimarykey(EmpNo), a transitive dependency existsEmpNo EName Salary AddressEmpNo Ename,Ename Addresst,EmpNo AddressEname AddressEmpNo EName Salary Ename AddressR1 R2 8. 843Boyce-Codd Normal Form (BCNF)A relation is in Boyce-Codd normal form(BCNF) if every determinant in the table is a candidate key.(A determinant is any attribute whose value determines other values with a row.)If a table contains only one candidate key, the 3NF and the BCNF are equivalent.BCNF is a special case of 3NF.Database NormalizationA Table That Is In 3NF But Not In BCNFFigure 5.7The Decomposition of a Table Structure to Meet BCNF R