VTU 5 Sem Cse DBMS Unit 6 Notes

Embed Size (px)

Citation preview

  • 8/20/2019 VTU 5 Sem Cse DBMS Unit 6 Notes

    1/12

    UNIT -6 

    Database DesignInformal Design Guidelines for Relation Schemas; Functional Dependencies;Normal Forms Based on Primary Keys; General Definitions of Second and

    Third Normal Forms; Boyce-Codd Normal Form. 

    Fourth Normal Form; and Fifth Normal Form;

    INFORMAL DESIGHN GUIDELINES FOR RELATIONAL SCHEMA

    1.Semantics of the Attributes

    2.Reducing the Redundant Value in Tuples.

    3.Reducing Null values in Tuples.

    4.Dissallowing spurious Tuples.

    1. Semantics of the Attributes

    Whenever we are going to form relational schema there should be some meaningamong the attributes.This meaning is called semantics.This semantics relates one

    attribute to another with some relation.

    Eg:

    USN No Student name Sem

    2. Reducing the Redundant Value in Tuples

    Mixing attributes of multiple entities may cause problems

    Information is stored redundantly wasting storage

    Problems with update anomalies

    Insertion anomalies

    Deletion anomalies

    Modification anomalies

  • 8/20/2019 VTU 5 Sem Cse DBMS Unit 6 Notes

    2/12

    The main goal of the schema diagram is to minimize the storage space that the base

    memory occupies.Grouping attributes information relations has asignificant effect onstorage space.

    Eg; 

    USN No Student name Sem

    Eg:

    Dept No Dept Name

    If we integrate these two and is used as a single table i.e Student Table

    USN No Student name Sem Dept No Dept Name

    Here whenever if we insert the tuples there may be ‘N’ stunents in one department,soDept No,Dept Name values are repeated ‘N’ times which leads to data redundancy.

    Another problem is updata anamolies ie if we insert new dept that has no students.

    If we delet the last student of a dept,then whole information about that department will be

    deleted

    If we change the value of one of the attributes of aparticaular table the we must update

    the tuples of all the students belonging to thet depy else Database will becomeinconsistent.

    Note: Design in such a way that no insertion ,deletion,modification anamolies will occur

    3. Reducing Null values in Tuples.

    Note: Relations should be designed such that their tuples will have as few NULL

    values as possible

    Attributes that are NULL frequently could be placed in separate relations (with the

    primary key)

    Reasons for nulls:

    attribute not applicable or invalid

    attribute value unknown (may exist)

  • 8/20/2019 VTU 5 Sem Cse DBMS Unit 6 Notes

    3/12

    value known to exist, but unavailable

    4. Disallowing spurious Tuples 

    Bad designs for a relational database may result in erroneous results for certain JOIN

    operations

    The "lossless join" property is used to guarantee meaningful results for join

    operations

    Note: The relations should be designed to satisfy the lossless join condition. No

    spurious tuples should be generated by doing a natural-join of any relations.

    Functional dependency

    1.  Functional dependencies (FDs) are used to specify formal measures  of the

    "goodness" of relational designs

    2.  FDs and keys are used to define normal forms for relations

    3.  FDs are constraints that are derived from the meaning  and interrelationships  of

    the data attributes

    4.  X->Y : A set of attributes X functionally determines  a set of attributes Y if the

    value of X determines a unique value for Y

    5.  X -> Y holds if whenever two tuples have the same value for X, they must have 

    the same value for Y

    6.  For any two tuples t1 and t2 in any relation instance r(R):  If   t1[X]=t2[X], then 

    t1[Y]=t2[Y]

    7.  X -> Y in R specifies a constraint   on all relation instances r(R)

    8.  Written as X -> Y; can be displayed graphically on a relation schema as in

    Figures. ( denoted by the arrow: ).

    9.  FDs are derived from the real-world constraints on the attributes

    10. social security number determines employee name

    SSN -> ENAME

    11.project number determines project name and location

    PNUMBER -> {PNAME, PLOCATION}

    11. employee ssn and project number determines the hours per week that the

    employee works on the project

  • 8/20/2019 VTU 5 Sem Cse DBMS Unit 6 Notes

    4/12

    {SSN, PNUMBER} -> HOURS

    Inference rules for FDs:

    Inference rules also known as Armstrong's Axioms are published by Armstrong. These

    properties are as given below:

    1.  Reflexivity property: X ->Y is true if Y is subset of X.

    2.  Augmentation property: If X->Y is true, then

    XZ ->YZ is also true.

    3.  Transitivity property: If X->Y and Y->Z then

    X ->Z is implied.

    4.  Union property: If X ->Y and X ->Z are true, then

    X ->YZ is also true. This property indicates that if right hand side of FD contains

    many attributes then FD exists for each of them.

    5.  Decomposition property: If X ->Y is implied and Z is subset of Y, then X ->Z is

    implied. This property is the reverse of union property.

    6.  Pseudotransitivity property: If X ->Y and WY ->Z are given, then XW ->Z is

    true.

    Normalization:

    Normalizing the database ensures the following things:

    Dependencies between data are identified.

    Redundant data is minimized.

    The data model is flexible and easier to maintain

    Normalization: The process of decomposing unsatisfactory "bad" relations by

    breaking up their attributes into smaller relations

    Normal form: Condition using keys and FDs of a relation to certify whether a

    relation schema is in a particular normal form

  • 8/20/2019 VTU 5 Sem Cse DBMS Unit 6 Notes

    5/12

    FIRST NORMAL FORM 

    The purpose of first normal form (1NF) is to eliminate repeating groups of

    attributes in an entity.

    Disallows composite attributes, multivalued attributes, and nested relations;attributes whose values for an individual tuple are non-amanjuic

    Consider the following

    table:

    SECOND NORMAL FORM 

    •  The purpose of second normal form (2NF) is to eliminate partial key

    dependencies.

    •  Each attribute in an entity must depend on the whole key, not just a part of it.

  • 8/20/2019 VTU 5 Sem Cse DBMS Unit 6 Notes

    6/12

    THIRD NORMAL FORM 

    Third Normal form also helps to eliminate redundant information by eliminating

    interdependencies between non-key attributes. 

    It is already in 2NF

    There are no non-key attributes that depend on another non-key attribute 

  • 8/20/2019 VTU 5 Sem Cse DBMS Unit 6 Notes

    7/12

    General Normal Form Definitions (For Multiple Keys) 

    The above definitions consider the primary key only

    The following more general definitions take into account relations with multiple

    candidate keys

    A relation schema R is in second normal form (2NF) if every non-prime attribute A

    in R is fully functionally dependent on every key  of R

    Definition:

    Superkey of relation schema R - a set of attributes S of R that contains a key of R

    A relation schema R is in third normal form (3NF) if whenever a FD X -> A holds

    in R, then either:

    X is a superkey of R, or

    A is a prime attribute of R

    Example 1

    CUSTOMER

    CustomerID Firstname Surname City PostCode

    12123 Harry Enfield London SW7 2AP

    12443 Leona Lewis London WC2H 7JY

    354 Sarah Brightman Coventry CV4 7AL

  • 8/20/2019 VTU 5 Sem Cse DBMS Unit 6 Notes

    8/12

    This is not in strict 3NF as the City could be obtained from the Post code attribute. If you

    create a table containing postcodes then city could be derived.

    CustomerID Firstname Surname PostCode*

    12123 Harry Enfield SW7 2AP

    12443 Leona Lewis WC2H 7JY

    354 Sarah Brightman CV4 7AL

    POSTCODES

    PostCode City

    SW7 2AP London

    WC2H 7JY London

    CV4 7AL Coventry

    Example 2. 

    VideoID Title Certificate Description

    12123 Saw IV 18 Eighteen and over

    12443 Igor PG Parental Guidance

    354 Bambi U Universal Classification

    The Description of what the certificate means could be obtained frome the certifcate

    attribute - it does not need to refer to the primary key VideoID. So split it out and use theprimary key / secondary key approach.

    Example 3 

    CLIENT

    ClientID CinemaID* CinemaAddress

    12123 LON23 1 Leicester Square. London

    12443 COV2 34 Bramby St, Coventry

    354 MAN4 56 Croydon Rd, Manchester

    CINEMAS

    CinemaID CinemaAddress

    LON23 1 Leicester Square. London

  • 8/20/2019 VTU 5 Sem Cse DBMS Unit 6 Notes

    9/12

    COV2 34 Bramby St, Coventry

    MAN4 56 Croydon Rd, Manchester

    In this case the database is almost in 3NF - for some reason the Cinema Address is being

    repeated in the Client table, even though it can be obtained from the Cinemas table. Sosimply remove the column from the client table

    BOYCE-CODD NORMAL FORM (BCNF)

    A relation schema R is in Boyce-Codd Normal Form (BCNF) if whenever an FD X

    -> A holds in R, then X is a superkey of R

    Each normal form is strictly stronger than the previous oneEvery 2NF relation is in 1NF

    Every 3NF relation is in 2NFEvery BCNF relation is in 3NFThere exist relations that are in 3NF but not in BCNF

    The goal is to have each relation in BCNF (or 3NF)

    Definition: A multivalued dependency (MVD) X —>> Y specified on relation 

    schema R, where X and Y are both subsets of R, specifies the following constraint on any relation state r of R: If two tuples t1 and t2 

    exist in r such that t1[X] = t2[X], then two tuples t3 and t4 should also 

  • 8/20/2019 VTU 5 Sem Cse DBMS Unit 6 Notes

    10/12

    exist in r with the following properties, where we use Z to denote 

    (R - ( X υ Y)): 

    t3[X] = t4[X] = t1[X] = t2[X]. 

    t3[Y] = t1[Y] and t4[Y] = t2[Y]. 

    t3[Z] = t2[Z] and t4[Z] = t1[Z]. 

    An MVD X —>> Y in R is called a trivial MVD if (a) Y is a subset of   X, or (b) X υ Y = R. 

    Fourth Normal Form (4NF) 

    •  Fourth normal form eliminates independent many-to-one relationships betweencolumns.

    •  To be in Fourth Normal Form,–  a relation must first be in Boyce-Codd Normal Form.

    –  a given relation may not contain more than one multi-valued attribute.•  Defined as a relation that is in Boyce-Codd Normal Form and contains no

    nontrivial multi-valued dependencies.

    Example

  • 8/20/2019 VTU 5 Sem Cse DBMS Unit 6 Notes

    11/12

    Fifth Normal Form (5NF)

    A relation decompose into two relations must have the lossless-join property,

    which ensures that no spurious tuples are generated when relations are reunited

    through a natural join operation.

    However, there are requirements to decompose a relation into more than tworelations. Although rare, these cases are managed by join dependency and fifth

    normal form (5NF).

    Constraints as Assertions

  • 8/20/2019 VTU 5 Sem Cse DBMS Unit 6 Notes

    12/12

    General constraints: constraints that do not fit in

    the basic SQL categoriesMechanism: CREAT ASSERTION 

    Components include:

    a constraint name,

    followed by CHECK,followed by a condition

    Assertions: An Example

    “The salary of an employee must not be greater

    than the salary of the manager of the department

    that the employee works for’’CREAT ASSERTION SALARY_CONSTRAINT

    CHECK (NOT EXISTS (SELECT *

    FROM EMPLOYEE E, EMPLOYEE M,DEPARTMENT D

    WHERE E.SALARY > M.SALARY ANDE.DNO=D.NUMBER ANDD.MGRSSN=M.SSN))

    SQL TriggersObjective: to monitor a database and take initiate action when a

    condition occurs

    Triggers are expressed in a syntax similar to assertions and

    include the following:EventSuch as an insert, deleted, or update operation

    Condition

    ActionTo be taken when the condition is satisfied

    SQL Triggers: An ExampleA trigger to compare an employee’s salary to his/her

    supervisor during insert or update operations:

    CREATE TRIGGER INFORM_SUPERVISOR 

    BEFORE INSERT OR UPDATE OF 

    SALARY, SUPERVISOR_SSN ON EMPLOYEE 

    FOR EACH ROW 

    WHEN 

    (NEW.SALARY> (SELECT SALARY FROM EMPLOYEE WHERE SSN=NEW.SUPERVISOR_SSN)) 

    INFORM_SUPERVISOR (NEW.SUPERVISOR_SSN,NEW.SSN);