75
Database Design Introduction Functional Dependencies Functional Dependency Theory Normalization Dependency Preservation Boyce Codd Normal Form Multivalued dependencies and Fourth Normal Form Join Dependencies and Fifth Normal Form

Normalization Bcnf

Embed Size (px)

DESCRIPTION

Normalization Bcnf

Citation preview

Database Design

Database DesignIntroductionFunctional DependenciesFunctional Dependency TheoryNormalizationDependency PreservationBoyce Codd Normal FormMultivalued dependencies and Fourth Normal FormJoin Dependencies and Fifth Normal FormIntroductionDesign Goal:Decide whether a particular relation R is in good form.In the case that a relation R is not in good form, decompose it into a set of relations {R1, R2, ..., Rn} such that each relation is in good form the decomposition is a lossless-join decomposition - On decomposition of a relation into smaller relations with fewer attributes the resulting relations whenever joined must result in the same relation without any extra rows. The join operations can be performed in any order. IntroductionA bad design may lead toRepetition of information- that leads to insert, delete and update anomalies.Inability to represent some informationAnomalies: unexpected results from an operation.delete: when deleting a value for an attribute, you inadvertently lose the value for some other attributeinsert: you need to store a value for a particular attribute but can't because you need some other value to include that occurrence (don't have key value)update: like insert but to change a value, you need to know all instances which may be hard to find.Functional DependencyConstraints on the set of legal relations.Require that the value for a certain set of attributes determines uniquely the value for another set of attributes.Definition of FD: Given a relation R, a set of attributes X in R is said to functionally determine another attribute Y, also in R, (written X Y) if and only if each X value is associated with at most one Y value.

Functional DependencyDeterminant - Attribute X can be defined as determinant if it uniquely defines the value Y in a given relationship or entity .Determinant attribute need NOT be a key attribute .Represented as X->Y ,which means attribute X decides attribute Y

Example of FDEmployee SSNNameJobTypeDeptName557-78-6587Lance SmithAccountantSalary214-45-2398Lance SmithEngineerProductSSN NameNote: Name is functionally dependent on SSN because an employees name can be uniquely determined from their SSN. Name does not determine SSN, because more than one employee can have the same name..Keyskey: a unique attribute (or field) which can be used to identify the entire tuple (or record) as uniquekey attributes are determinants but not all the determinants are key attributes. Eg:marks Gradecandidate key: the set of all attributes (or combinations) which might serve as a keyprimary key: key selected by the database administrator as the key we will use for that relationcomposed (or composite) key: a key of two or more fieldsFD Contd..Consider the following relation : REPORT (Student#, Course#, CourseName, IName, Room#, Marks, Grade) Where: Student#-Student Number Course#-Course Number CourseName -CourseName IName- Name of the instructor who delivered the course Room#-Room number which is assigned to respective instructor Marks- Scored in Course Course# by student Student # Grade Obtained by student Student# in course Course #

FD Contd..Student#,Course# together (called composite attribute) defines EXACTLY ONE value of marks . This can be symbolically represented as Student#Course# Marks Other Functional dependencies in above examples are: Course# -> CourseName Course#-> IName(Assuming one course is taught by one and only one instructor ) IName -> Room# (Assuming each instructor has his /her own and non-shared room) Marks ->Grade Formal definition of FD: In a given relation R, X and Y are attributes. Attribute Y is functional dependent on attribute X if each value of X determines exactly one value of Y. This is represented as : X->Y However X may be composite in nature. FD Contd..A functional dependency is trivial if it is satisfied by all instances of a relationExample: customer_name, loan_number customer_name customer_name customer_nameIn general, is trivial if Full functional dependency: In a given relation R ,X and Y are attributes. Y is fully functionally dependent on attribute X only if it is not functionally dependent on sub-set of X. However X may be composite in nature.FD Contd..Full functional dependency : Eg: Marks is fully functional dependent on student# Course# and not on the sub set of Student#Course# . CourseName is not fully functionally dependent on student#course# because one of the subset course# determines the course name

FD Contd..Partial dependency: In a given relation R, X and Y are attributes .Attribute Y is partially dependent on the attribute X only if it is dependent on subset attribute X .However X may be composite in nature. Eg:CourseName, IName,Room# are partially dependent on composite attribute Student#Course# because Course# alone can defines the coursename, IName,Room#.

FD-Partial Dependency

FD- Transitive DependencyTransitive Dependency: Room# depends on IName and in turn depends on Course# . Here Room# transitively depends on Course#. Similarly Grade depends on Marks,in turn Marks depends on Student#Course# hence Grade Fully transitively depends on Student#Course#.

ClosureGiven a set F set of functional dependencies, there are certain other functional dependencies that are logically implied by F.For example: If A B and B C, then we can infer that A CThe set of all functional dependencies logically implied by F is the closure of F.We denote the closure of F by F+.F+ is a superset of F.AxiomsDeveloped by Armstrong in 1974, there are six rules (axioms) that all possible functional dependencies may be derived from them. 1. Reflexivity Rule --- If X is a set of attributes and Y is a subset of X, then X Y holds. each subset of X is functionally dependent on X. 2. Augmentation Rule --- If X Y holds and W is a set of attributes, then WX WY holds. 3. Transitivity Rule --- If X Y and Y Z holds, then X Z holds. These rules are sound (generate only functional dependencies that actually hold) and complete (generate all functional dependencies that hold).

Derived Theorems from Axioms4. Union Rule --- If X Y and X Z holds, then X YZ holds. 5. Decomposition Rule --- If X YZ holds, then so do X Y and X Z. 6. Pseudotransitivity Rule --- If X Y and WY Z hold then so does WX Z.

ExampleR = (A, B, C, G, H, I)F = { A B A CCG HCG I B H}some members of F+A H by transitivity from A B and B HAG I by augmenting A C with G, to get AG CG and then transitivity with CG I Introduction to Normalization Normalization: Process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relationsGoals:Eliminating redundant data Ensuring data dependencies make sense (only storing related data in a table). These goals reduce the amount of space a database consumes and ensure that data is logically stored.

Need for NormalizationMinimize data redundancy i.e. no unnecessarily duplication of data.To make database structure flexible i.e. it should be possible to add new data values and rows without reorganizing the database structure.Data should be consistent throughout the database i.e. it should not suffer from following anomalies.Insert anomalyUpdate anomalyDelete anomalyADVANTAGES OF NORMALIZATION

More efficient data structure.Avoid redundant fields or columns.More flexible data structure i.e. we should be able to add new rows and data values easilyBetter understanding of data.Ensures that distinct tables exist when necessary.o Easier to maintain data structure i.e. it is easy to perform operations and complex queries can be easily handled.o Minimizes data duplication.o Close modeling of real world entities, processes and their relationships.

DISADVANTAGES OF NORMALIZATION

We cannot start building the database before you know what the user needs. On Normalizing the relations to higher normal forms i.e. 4NF, 5NF the performance degrades.It is very time consuming and difficult process in normalizing relations of higher degree.Careless decomposition may leads to bad design of database which may leads to serious problems.

Normal FormInitially Codd (1972) presented three normal forms (1NF, 2NF and 3NF) all based on functional dependencies among the attributes of a relation. Later Boyce and Codd proposed another normal form called the Boyce-Codd normal form (BCNF). The fourth and fifth normal forms are based on multi-value and join dependencies and were proposed later. The primary objective of normalization is to avoid anomalies.Normal Forms: ReviewUnnormalized There are multivalued attributes or repeating groups1 NF No multivalued attributes or repeating groups.2 NF 1 NF plus no partial dependencies3 NF 2 NF plus no transitive dependenciesExample Relation Record

First Normal Form(1NF)

First Normal Form(1NF) A relation R is said to be in first normal form (1NF) if and only if all the attributes of the relation R, are atomic in nature.

That means only one piece of data can be stored within the field (attribute) of a particular record (tuple).

Non-atomic values complicate storage and encourage redundant (repeated) storage of dataExample Relation Record

First Normal Form(1NF)Eg:Student details are repeated for each course and course details are repeated for each student.To avoid this Student Details, Course Details and Result Details can be further divided. Student Details attribute is divided into Student#(Student Number) , Student Name and date of birth. Course Details is divided into Course#, Course Name,Prerequisites and duration.Results attribute is divided into Student#,Course#,DateOfexam, Marks and Grade. Student Table

Course Table

Result Table

Second Normal Form (2NF)

A relation is said to be in Second Normal Form if and only If: It is in the first normal form ,and No partial dependency exists between non-key attributes and key attributes.

Let us re-visit 1NF table structure. Student# is key attribute for Student , Course# is key attribute for Course Student#Course# together form the composite key attributes for result.Other attributes are non-key attributes.

Second Normal Form (2NF)To make this table 2NF complaint, we have to remove all the partial dependencies. StudentName and DateOfBirth depend only on student#. CourseName,PreRequisite and DurationInDays depends only on Course# DateOfExam depends only on Course#. To remove this partial dependency we need to split the table Result into two table1. Result(Student#,Course#,Marks,Grade)2. Exam(Course#,DateofExam)Result and Exam Table

Second Normal Form (2NF)In the first table (STUDENT), the key attribute is Student# and all other non-key attributes, StudentName and DateOfBirth are fully functionally dependant on the key attribute. In the Second Table (COURSE) , Course# is the key attribute and all the non-key attributes, CourseName, DurationInDays are fully functional dependant on the key attribute. In third table (RESULT) Student#Course# together are key attributes and all other non-key attributes, Marks and Grade are fully functional dependant on the key attributes. In the fourth Table (EXAM DATE) Course# is the key attribute and the non-key attribute, DateOfExam is fully functionally dependant on the key attribute

Second Normal Form (2NF)

What about anomalies?At first look it appears like all our anomalies are taken away! Now we are storing Student 1003 and M4 record only once. We can insert prospective students and courses at our will. We will update only once if we need to change any data in STUDENT, COURSE tables. We can get rid of any course or student details by deleting just one row.

Second Normal Form (2NF)Let us analyse the RESULT Table We already concluded that: All attributes are atomic in nature No partial dependency exists between the key attributes and non-key attributes RESULT table is in 2NF

Second Normal Form (2NF)Assume, at present, as per the university evaluation policy, Students who score more than or equal to 80 marks are awarded with A grade Students who score more than or equal to 70 marks up till 79 are awarded with B gradeStudents who score more than or equal to 60 marks up till 69 are awarded with C gradeStudents who score more than or equal to 50 marks up till 59 are awarded with D grade The University management which is committed to improve the quality of education wants to change the existing grading system to a new grading system .In the present RESULT table structure, We dont have an option to introduce new grades like A+ , B- and E We need to do multiple updates on the existing record to bring them to new grading definition We will not be able to take away D grade if we want to. 2NF does not take care of all the anomalies and inconsistencies. Second Normal Form (2NF)Third Normal Form 3NFA relation R is said to be in 3NF if and only if It is in 2NF No transitive dependency exists between non-key attributes and key attributes. In the RESULT table Student# and Course# are the key attributes. All other attributes, except grade are non-partially, non transitively dependant on key attributes. The grade attribute is dependent on Marks and in turn Marks is dependent on Student# Course#. To bring the table in 3NF we need to take off this transitive dependency. Third Normal Form 3NF-Result & Grade Table

Third Normal Form 3NFAfter normalizing tables to 3NF, we got rid of all the anomalies and inconsistencies. Now we can add new grade systems, update the existing one and delete the unwanted ones. Hence the Third Normal form is the most optimal normal form and 99% of the databases which require efficiency in INSERT UPDATE DELETE Operations are designed in this normal form. BCNFA relation is in Boyce-Codd normal form if and only if every determinant is a candidate key.Formal Definition:A table is in BCNF if and only if for every one of its non-trivial functional dependencies X Y, X is a superkeythat is, X is either a candidate key or a superset thereof. (Y is not included in X)

BCNFWhen a relation has more than one candidate key, anomalies may result even though the relation is in 3NF.It should be noted that most relations that are in 3NF are also in BCNF. Infrequently, a 3NF relation is not in BCNF and this happens only if (a) the candidate keys in the relation are composite keys (that is, they are not single attributes), (b) there is more than one candidate key in the relation, and (c) the keys are not disjoint, that is, some attributes in the keys are common

Difference between 3NF and BCNFDifference between 3NF and BCNF : For a functional dependency A B, 3NF allows this dependency in a relation if B is a primary-key attribute and A is not a candidate key. Whereas, BCNF insists that for this dependency to remain in a relation, A must be a candidate key.

Every relation in BCNF is also in 3NF. However, a relation in 3NF is not necessarily in BCNF.

A Table That Is In 3NF But Not In BCNF

46The Decomposition of a Table Structure to Meet BCNF Requirements

47

Sample Data for a BCNF ConversionDecomposition into BCNF

49BCNFEG:Address (Not in BCNF)Scheme {City, Street, ZipCode }Key1 {City, Street }Key2 {ZipCode, Street}No non-key attribute hence 3NF{City, Street} {ZipCode}{ZipCode} {City}Dependency between attributes belonging to a key

Place the two candidate primary keys in separate entitiesPlace each of the remaining data items in one of the resulting entities according to its dependency on the primary key. Example 1 (Convert to BCNF) Old Scheme {City, Street, ZipCode }New Scheme1 {ZipCode, Street}New Scheme2 {City, Street}Loss of relation {ZipCode} {City}Alternate New Scheme1 {ZipCode, Street }Alternate New Scheme2 {ZipCode, City}

BCNF - Decomposition51Ifthere is a table with columns A,B,C with Primary Key (A) and C is dependant on B (B C) then to be 3NF, the tables becomeTable with columns B,C with Primary Key (B)Table with fields A,B with Primary Key ( A), and Foreign Key (B) Dependency preservationProperties of Decomposition:If decomposition does not cause any loss of information it is called a lossless decomposition. If a decomposition does not cause any dependencies to be lost it is called a dependency-preserving decomposition. Normalisation to 3NF is always lossless and dependency preservingNormalisation to BCNF is lossless, but may not preserve all dependencies

Example Grade_report(StudNo,StudName,(Major,Adviser,(CourseNo,Ctitle,InstrucName,InstructLocn,Grade)))Functional dependenciesStudNo -> StudNameCourseNo -> Ctitle,InstrucNameInstrucName -> InstrucLocnStudNo,CourseNo,Major -> GradeStudNo,Major -> AdvisorAdvisor -> Major

Example cont...UnnormalisedGrade_report(StudNo,StudName,(Major,Advisor, (CourseNo,Ctitle,InstrucName,InstructLocn,Grade)))1NF Remove repeating groupsStudent(StudNo,StudName)StudMajor(StudNo,Major,Advisor)StudCourse(StudNo,Major,CourseNo, Ctitle,InstrucName,InstructLocn,Grade)

Example cont...1NFStudent(StudNo,StudName)StudMajor(StudNo,Major,Advisor)StudCourse(StudNo,Major,CourseNo, Ctitle,InstrucName,InstructLocn,Grade)

2NF Remove partial key dependenciesStudent(StudNo,StudName)StudMajor(StudNo,Major,Advisor)StudCourse(StudNo,Major,CourseNo,Grade)Course(CourseNo,Ctitle,InstrucName,InstructLocn)

Example 2 cont...2NFStudent(StudNo,StudName)StudMajor(StudNo,Major,Advisor)StudCourse(StudNo,Major,CourseNo,Grade)Course(CourseNo,Ctitle,InstrucName,InstructLocn)

3NF Remove transitive dependenciesStudent(StudNo,StudName)StudMajor(StudNo,Major,Advisor)StudCourse(StudNo,Major,CourseNo,Grade)Course(CourseNo,Ctitle,InstrucName)Instructor(InstructName,InstructLocn)

Example cont...BCNF Every determinant is a candidate keyStudent : only determinant is StudNoStudCourse: only determinant is StudNo,MajorCourse: only determinant is CourseNoInstructor: only determinant is InstrucNameStudMajor: the determinants areStudNo,Major, orAdvisorOnly StudNo,Major is a candidate key.

Example : BCNFBCNF

Student(StudNo,StudName)StudCourse(StudNo,Major,CourseNo,Grade)Course(CourseNo,Ctitle,InstrucName)Instructor(InstructName,InstructLocn)StudMajor(StudNo,Advisor)Adviser(Adviser,Major)

Problems BCNF overcomesIf the record for student 456 is deleted we lose not only information on student 456 but also the fact that DARWIN advises in BIOLOGYwe cannot record the fact that WATSON can advise on COMPUTING until we have a student majoring in COMPUTING to whom we can assign WATSON as an advisor.

STUDENTMAJORADVISOR123PHYSICS EINSTEIN 123MUSIC MOZART 456BIOLOGY DARWIN 789PHYSICS BOHR 999PHYSICS EINSTEIN Split into two tablesIn BCNF we have two tablesSTUDENTADVISOR123EINSTEIN 123MOZART 456DARWIN 789BOHR 999EINSTEIN ADVISORMAJOREINSTEINPHYSICS MOZARTMUSIC DARWINBIOLOGY BOHRPHYSICS Higher Normal Formsa relation in BCNF, is also in 3NF

a relation in 3NF is also in 2NF

a relation in 2NF is also in 1NF

1NF Relations

2NF Relations

3NF Relations

BCNF Relations

4NF Relations

5NF RelationsMultivalued Dependency(MVD)

Multi-valued Dependency (MVD) The multivalued dependency X --> --> Y holds in a relation R if whenever we have two tuples of R that agree in all the attributes of X, then we can swap their Y components and get two new tuples that are also in R.

Trivial: If Y is a subset of X or X U Y is all the attributes of R.

Non-Trivial: If Y is not a subset of X and X U Y is not all the attributes of R.

63MVDs are denoted with double arrow symbol.

The multivalued dependency X --> --> Y holds in a relation R if whenever we have two tuples of R that agree in all the attributes of X, then we can swap their Y components and get two new tuples that are also in R.

MVDs requires tuples of a certain form be present in R. Because of this MVDs are also referred to as tuple-generating dependency.

When dealing with MVD you have to be sure to check the whole table that every value of one attribute must be repeated with every value of other attribute to keep it consistent.

TrivialNon-TrivialMVD v. FD

64Here is a visual example of the difference between Multi-valued Dependency and Functional Dependency.

In the case X determines Y.

Differences:FD: 1 to 1MVD: many to many

MVD

There exist anomalies/redundancies in relational schemas that cannotbe captured by FDs.Example: consider the following table:There are no (non-trivial) FDs that hold on this scheme; therefore the scheme (Course, Set-of-teachers, Set-of-books) is in BCNF.MVD

Axioms for MVDs

Fourth Normal Form(4NF)A table is in fourth normal form (4NF) if and only if it is in BCNF and contains no more than one multi-valued dependency. Anomalies can occur in relations in BCNF if there is more than one multi-valued dependency. If A--->B and A--->C but B and C are unrelated, ie A--->(B,C) is false, then we have more than one multi-valued dependency.

Fourth Normal Form(4NF)The CTB table is difficult to maintain since adding a new book requires multiple new rows corresponding to each teacher. This problem is created by the pair of multi-valued dependencies Course-->>Teacher and Course>Book. A much better alternative would be to decompose CTB into two relations:

Fourth Normal Form(4NF)

Lossless-join Decomposition

Join Dependency and Fifth Normal Form

Join Dependency and Fifth Normal Form

Fifth Normal FormFifth normal form (5NF), also known as project-join normal form (PJ/NF) is a level of database normalization designed to reduce redundancy in relational databases recording multi-valued facts by isolating semantically related multiple relationships. A table is said to be in the 5NF if and only if every join dependency in it is implied by the candidate keys.Anomalies can occur in relations in 4NF if the primary key has three or more fields.Example: Buyer,Vendor,Item- RelationDomain/Key Normal Form (DKNF)Domain/key normal form (DKNF) is a normal form used in database normalization which requires that the database contains no constraints other than domain constraints and key constraints.A domain constraint specifies the permissible values for a given attribute, while a key constraint specifies the attributes that uniquely identify a row in a given table.