35
Database Systems Normalization GCE (A/L) ICT Training for Teachers 1

NIE Normalization

Embed Size (px)

DESCRIPTION

NIE Normalization

Citation preview

  • Database Systems

    Normalization

    GCE (A/L) ICT Training for Teachers 1

  • Contact Persons

    Name : Buddhika H. Kasthuriarachchy

    Email : [email protected]

    Phone : 0112 413900 ext: 4301

    Mobile : 0773607507

    GCE (A/L) ICT Training for Teachers 2

  • Recommended Reading

    https://sites.google.com/site/ictalnie/

    Google user name : ictalnie2011

    Password : ictalniepython

    Fundamentals of Database Systems (5th

    Edition) Ramez Elmasri /Shamkant B. Navathe

    Database Management Systems (2nd

    Edition) - Raghu Ramakrishna

    /Johannes Gehrke

    GCE (A/L) ICT Training for Teachers 3

  • Introduction

    Conceptual Modeling is a subjective process

    Therefore, the schema after the logical database design phase may not be very good (contain redundancies)

    However, there are formalisms to ensure that the schema is good.

    This process is called Normalization

    GCE (A/L) ICT Training for Teachers 4

  • Relational database schema = set of relations

    Relation = set of attributes

    How we group the attributes to relations is very important

    GCE (A/L) ICT Training for Teachers 5

  • Too many attributes in a relation Waste space

    Anomalies

    Decomposing the relation into too smaller set of relations

    Loss-less join property

    Dependency preserving property

    GCE (A/L) ICT Training for Teachers 6

  • Too many attributes

    For example,

    LECTURER(id, name, address, salary,

    deptno,dname building)

    GCE (A/L) ICT Training for Teachers 7

  • Insertion Anomaly

    1. Inserting a new lecturer to the

    LECTURER table

    - Department information is repeated

    (ensure that correct department

    information is inserted).

    2. Inserting a department with no

    employees

    (Impossible b/c null values for id is not allowed)

    GCE (A/L) ICT Training for Teachers 8

  • Deletion Anomalies

    Deleting the last lecturer from the department will lose information about

    the department

    GCE (A/L) ICT Training for Teachers 9

  • Update Anomalies

    Updating the departments building needs to be done for all lecturers

    working for that department

    GCE (A/L) ICT Training for Teachers 10

  • When redundancies exists, we should decompose the relations to smaller

    relations

    Loss-less join property: we might lose information if we decompose relations

    Dependency-preserving property: The set of dependencies in S can be

    verified by a set of dependencies in R1and R

    GCE (A/L) ICT Training for Teachers 11

  • Loss-less join property:

    For example,

    GCE (A/L) ICT Training for Teachers 12

    S P D

    S1 P1 D1

    S2 P2 D2

    S3 P1 D3

    S P

    S1 P1

    S2 P2

    S3 P1

    P D

    P1 D1

    P2 D2

    P1 D3

    S R1 R2

  • Joining them together, we get spurious tuples

    GCE (A/L) ICT Training for Teachers 13

    S P D

    S1 P1 D1

    S1 P1 D3

    S2 P2 D2

    S3 P1 D1

    S3 P1 D3

    R1 R2

  • To avoid the above mentioned issues in the relational schema, we can apply

    a formal process called Normalization

    Normalization is based on functional dependencies

    GCE (A/L) ICT Training for Teachers 14

  • A functional dependency, denoted by X Y, where X and Y are sets of attributes in relation R, specifies the following constraint:

    Let t1 and t2 be tuples of relation R for any given instance

    Whenever t1[X] = t2[X] then t1[Y] = t2[Y]

    where ti[X] represents the values for X in tuple ti

    GCE (A/L) ICT Training for Teachers 15

  • Key points:

    Redundancy is based on functional dependencies

    Therefore, normalization is based on functional dependencies

    GCE (A/L) ICT Training for Teachers 16

  • Given some FDs, we can usually infer additional FDs:

    A B, B C implies A C

    An FD f is implied by a set of FDs F if f holds whenever all FDs in F hold.

    F+ = closure of F is the set of all FDs that are implied by F.

    How can we get F+?

    GCE (A/L) ICT Training for Teachers 17

  • Armstrongs Axioms (X, Y, Z are sets of attributes):

    Reflexivity: If X Y, then Y X

    Augmentation: If X Y, then XZ YZ for any Z

    Transitivity: If X Y and Y Z, then X Z

    These are sound and complete inference rules for FDs!

    GCE (A/L) ICT Training for Teachers 18

  • Couple of additional rules (that follow from AA):

    Union: If X Y and X Z, then X YZ

    Decomposition: If X YZ, then X Y and X Z

    Example: Contracts(cid,sid,jid,did,pid,qty,value), and:

    C is the key: C CSJDPQV

    Project purchases each part using single contract: JP C

    Dept purchases at most one part from a supplier: SD P

    JP C, C CSJDPQV imply JP CSJDPQV

    SD P implies SDJ JP

    SDJ JP, JP CSJDPQV imply SDJ CSJDPQV

    GCE (A/L) ICT Training for Teachers 19

  • Why is F+ important?

    X RHS in relation R

    X is a subset of attributes in relation R. If RHScontains all attributes of R, then X is a superkey.

    If X is not a superkey, then values for X can repeat in different tuples resulting in redundancy!!!

    So determining F+ can help us find superkeys and check for any redundancy.

    GCE (A/L) ICT Training for Teachers 20

  • Computing the closure of a set of FDs can be expensive. (Size of closure is exponential in # attrs!)

    Typically, we just want to check if a given FD X Y is in the closure of a set of FDs F+. An efficient

    check:

    Compute attribute closure of X (denoted X+) wrt F:

    Set of all attributes A such that X A is in F+

    There is a linear time algorithm to compute this.

    Check if Y is in X+

    GCE (A/L) ICT Training for Teachers 21

  • Algorithm to find X+:

    closure = X;

    repeat until there is no change: {

    If there is an FD U V in F such that U closure

    then set closure = closure V

    }

    Does F = {A B, B C, CD E } imply A E?

    i.e, is A E in the closure F+? Equivalently, is E in A+?

    We can use the attribute closure to find out keys of the relation. If X+ contains all attributes of the relation, then X is a superkey.

    GCE (A/L) ICT Training for Teachers 22

  • Schema Refinement Steps:

    Determine F for relation R

    Find all keys in F using attribute closure

    Normalize

    GCE (A/L) ICT Training for Teachers 23

  • There are many Normal Forms proposed to reduce redundancies

    Some of the well-known ones are:

    1st Normal Form

    2nd Normal Form

    3rd Normal Form

    Boyce-Codd Normal Form

    GCE (A/L) ICT Training for Teachers 24

  • Lossless join decomposition: Decomposition of R into X and Y is lossless-join

    w.r.t. a set of FDs F if, for every instance r that satisfies F:

    X(r) Y (r) = r

    TheoremThis condition holds if attributes common to X

    and Y contains a key for either X or Y

    We can find a lossless join decomposition for 1st NF, 2nd NF, 3rd NF and BCNF (will see later)

    GCE (A/L) ICT Training for Teachers 25

  • Dependency preserving property:

    A relation R with a set of functional dependencies F, is decomposed into relations X and Y are said

    to be dependency preserving iff F+ = (Fx FY)+

    That is, a dependency-preserving decomposition allows us to enforce all FDs by examining a single

    relation instance.

    We can always obtain a dependency preserving decomposition for 1st NF, 2nd NF and 3rd NF. Not

    necessarily for BCNF (will see later)

    GCE (A/L) ICT Training for Teachers 26

  • Review of some terms

    Superkey: Set if attributes S in relation R such that no two distinct tuples t1 and

    t2 will have t1[S] = t2[S]

    Key: A key is a superkey with the additional property that removal of any

    attributes from the key will not satisfy

    the key condition

    GCE (A/L) ICT Training for Teachers 27

  • Candidate Key: Each key of a relation is called a candidate key

    Primary Key: A candidate key is chosen to be the primary key

    Prime Attribute: an attribute which is a member of a candidate key

    Nonprime Attribute: An attribute which is not prime

    GCE (A/L) ICT Training for Teachers 28

  • 1st Normal Form

    A relation R is in first normal form (1NF) if domains of all attributes in the relation are atomic (simple & indivisible).

    GCE (A/L) ICT Training for Teachers 29

  • 2nd Normal Form:

    A relation R is in second normal form (2NF) if every nonprime attribute A in R

    is not partially dependent on any key of

    R

    GCE (A/L) ICT Training for Teachers 30

  • Example

    EMP_PROJ

    GCE (A/L) ICT Training for Teachers 31

    NIC PNUM HOURS ENAME PNAME LOC

    FD1

    FD2

    FD3

  • GCE (A/L) ICT Training for Teachers 32

    NIC PNUM HOURS

    NIC ENAME

    PNUM PNAME PLOC

    EP1

    EP2

    EP3

  • 3rd Normal Form:

    A relation R is in 3rd normal form (3NF) if every

    R is in 2NF, and

    No nonprime attribute is transitively dependent on any key

    GCE (A/L) ICT Training for Teachers 33

  • Example,

    EMP_DEPT

    GCE (A/L) ICT Training for Teachers 34

    ENAME SSN BDATE ADD DNUM DNAME DMGR

  • GCE (A/L) ICT Training for Teachers 35

    ED1

    ED2

    ENAME SSN BDATE ADD DNUM

    DNUM DNAME DMGR