16 Normalization

  • View
    215

  • Download
    0

Embed Size (px)

Text of 16 Normalization

  • 1

    CS 338: Computer Applications in Business: Databases (Fall 2014)

    1992-2014 by Addison Wesley & Pearson Education, Inc., McGraw Hill, Cengage Learning Slides adapted and modified from Fundamentals of Database Systems (5/6) (Elmasri et al.), Database System Concepts (5/6) (Silberschatz et al.), Database Systems (Coronel et al.), Database Systems (4/5) (Connolly et al. ), Database Systems: Complete Book (Garcia-Molina et al.)

    CS 338: Computer Applications in Business: Databases

    Basics of Functional Dependencies and Normalization for Relational Databases

    1992-2014 by Addison Wesley & Pearson Education, Inc., McGraw Hill, Cengage Learning Slides adapted and modified from Fundamentals of Database Systems (5/6) (Elmasri et al.), Database System Concepts (5/6) (Silberschatz et al.), Database Systems (Coronel et al.), Database Systems (4/5) (Connolly et al. ), Database Systems: Complete Book (Garcia-Molina et al.) Rice University Data Center

    Fall 2014

    Chapter 15

    Overview

    Database design may be performed using two approaches: bottom-up or top-down

    2

    Bottom-Up Approach

    Considers basic relationships among individual attributes as the starting point and uses those to construct relation schemas

    Not Popular: It suffers from the problem of having to collect a large number of binary relationships among attributes as the starting point

    Top-Down Approach

    Starts with a number of groupings of attributes into relations that exist together

    Then, the relations are analyzed individually and collectively, leading to further decomposition until all properties are met

  • 2

    CS 338: Computer Applications in Business: Databases (Fall 2014)

    1992-2014 by Addison Wesley & Pearson Education, Inc., McGraw Hill, Cengage Learning Slides adapted and modified from Fundamentals of Database Systems (5/6) (Elmasri et al.), Database System Concepts (5/6) (Silberschatz et al.), Database Systems (Coronel et al.), Database Systems (4/5) (Connolly et al. ), Database Systems: Complete Book (Garcia-Molina et al.)

    Overview

    Implicit goals of the design activity

    3

    1. Information preservation

    Maintaining all concepts, including attribute types, entity types, relationship types as well as generalization/specialization relationships

    Relational design must preserve all of these concepts originally captured in the conceptual design after the conceptual to logical design mapping

    2. Minimum redundancy

    Minimize redundant storage of the same information and reducing the need for multiple updates to maintain consistency across multiple copies of the same information

    Informal Design Guidelines for Relation Schemas

    Four informal guidelines that can measures of quality of relation schema design

    4

    1. Making sure attribute semantics are clear

    2. Reducing redundant information in tuples

    3. Reducing the NULL values in tuples

    4. Disallowing possibility of generating spurious tuples

  • 3

    CS 338: Computer Applications in Business: Databases (Fall 2014)

    1992-2014 by Addison Wesley & Pearson Education, Inc., McGraw Hill, Cengage Learning Slides adapted and modified from Fundamentals of Database Systems (5/6) (Elmasri et al.), Database System Concepts (5/6) (Silberschatz et al.), Database Systems (Coronel et al.), Database Systems (4/5) (Connolly et al. ), Database Systems: Complete Book (Garcia-Molina et al.)

    1. Making sure attribute semantics are clear

    5

    Semantics of a relation

    Whenever we group attributes to form a relation schema, we assume that

    attributes belong to one relation have certain real-world meaning and

    a proper interpretation associated with them

    Semantics of a relation refers to its meaning resulting from the interpretation of attribute values in a tuple

    Recall: a relation can be interpreted as a set of facts

    If conceptual design is done carefully and the mapping procedure is followed systematically, the relation schema design should have a clear meaning

    1. Making sure attribute semantics are clear

    6

    Easier to explain semantics of relation (indicates better design)

    Each tuple represents an

    employee

    Dnumber is a foreign key that represents

    implicitly a relationship

  • 4

    CS 338: Computer Applications in Business: Databases (Fall 2014)

    1992-2014 by Addison Wesley & Pearson Education, Inc., McGraw Hill, Cengage Learning Slides adapted and modified from Fundamentals of Database Systems (5/6) (Elmasri et al.), Database System Concepts (5/6) (Silberschatz et al.), Database Systems (Coronel et al.), Database Systems (4/5) (Connolly et al. ), Database Systems: Complete Book (Garcia-Molina et al.)

    1. Making sure attribute semantics are clear

    7

    Easier to explain semantics of relation (indicates better design)

    Each tuple represents an employee with values for the employees name (Ename), SSN (Ssn), birth date (Bdate), and address (address), and the department number (Dnum)

    Dnumber is a foreign key that represents implicitly a relationship

    Each tuple in DEPT_LOCATIONS gives Department number (Dnumber) and one of the locations of the department (Dlocation) (multivalued attribute)

    Each tuple in WORKS_ON gives an employee Ssn, the project number of one of the projects that the employee works on (Pnumber), and the number of hours per week (Hours)

    Guideline 1

    Design relation schema so that it is easy to explain its meaning

    Do not combine attributes from multiple entity types and relationship types into a single relation

    8

    Although nothing wrong logically with these two relations, they violate Guideline 1 by mixing attributes from distinct real-world entities EMP_DEPT mixes attributes of

    employees and departments EMPL_PROJ mixes attributes of

    employees and projects and the WORKS_ON relationship

  • 5

    CS 338: Computer Applications in Business: Databases (Fall 2014)

    1992-2014 by Addison Wesley & Pearson Education, Inc., McGraw Hill, Cengage Learning Slides adapted and modified from Fundamentals of Database Systems (5/6) (Elmasri et al.), Database System Concepts (5/6) (Silberschatz et al.), Database Systems (Coronel et al.), Database Systems (4/5) (Connolly et al. ), Database Systems: Complete Book (Garcia-Molina et al.)

    2. Reducing redundant information in tuples

    Major aim of relational database design is to group attributes into relations to minimize data redundancy.

    Significant effect on storage space

    9

    Only the department number (Dnumber) is repeated in the EMPLOYEE relation for each employee who works in the department as a foreign key

    2. Reducing redundant information in tuples

    Example: EMP_DEPT is the result of applying the NATURAL JOIN

    operation to EMPLOYEE and DEPARTMENT

    Attribute values pertaining to a particular department (Dnumber, Dname, Dmgr_ssn) are repeated for every employee who works for that department

    10

  • 6

    CS 338: Computer Applications in Business: Databases (Fall 2014)

    1992-2014 by Addison Wesley & Pearson Education, Inc., McGraw Hill, Cengage Learning Slides adapted and modified from Fundamentals of Database Systems (5/6) (Elmasri et al.), Database System Concepts (5/6) (Silberschatz et al.), Database Systems (Coronel et al.), Database Systems (4/5) (Connolly et al. ), Database Systems: Complete Book (Garcia-Molina et al.)

    2. Reducing redundant information in tuples

    Potential benefits include:

    Updates to the data stored in the database are achieved with a minimal number of operations thus reducing the opportunities for data inconsistencies.

    Reduction in the file storage space required by the base relations thus minimizing costs.

    Problems associated with data redundancy are illustrated by comparing an example on the next slide

    11

    2. Reducing redundant information in tuples

    Example

    12

  • 7

    CS 338: Computer Applications in Business: Databases (Fall 2014)

    1992-2014 by Addison Wesley & Pearson Education, Inc., McGraw Hill, Cengage Learning Slides adapted and modified from Fundamentals of Database Systems (5/6) (Elmasri et al.), Database System Concepts (5/6) (Silberschatz et al.), Database Systems (Coronel et al.), Database Systems (4/5) (Connolly et al. ), Database Systems: Complete Book (Garcia-Molina et al.)

    2. Reducing redundant information in tuples Update Anomalies

    StaffBranch relation has redundant data; the details of a branch are repeated for every member of staff.

    In contrast, the branch information appears only once for each branch in the Branch relation and only the branch number (branchNo) is repeated in the Staff relation, to represent where each member of staff is located.

    Storing natural joins of base relations leads to an additional problem referred to as update anomalies (a data inconsistency that results from data redundancy and a form of manipulation/update

    Types of update anomalies include

    Insertion

    Deletion

    Modification 13

    2. Reducing redundant information in tuples Update Anomalies

    14

    Insertion Anomalies

    1. To insert a new employee tuple into EMP_DEPT, we must include either the attribute values for the department that the employee works for

    Include NULLs if employee does not work for a department as yet

    We must enter all the attribute values of department so that they are consistent with the corresponding values of that department