Student Manual Cf182stud

Relational Database Design (Course Code CF18)

Student NotebookERC 2.0

Worldwide Certified MaterialIBM Learning Services

V1.2.2.2

cover

Student Notebook

The information contained in this document has not been submitted to any formal IBM test and is distributed on an “as is” basis withoutany warranty either express or implied. The use of this information or the implementation of any of these techniques is a customerresponsibility and depends on the customer’s ability to evaluate and integrate them into the customer’s operational environment. Whileeach item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results willresult elsewhere. Customers attempting to adapt these techniques to their own environments do so at their own risk. The originalrepository material for this course has been certified as being Year 2000 compliant.

© Copyright International Business Machines Corporation 2000, 2002. All rights reserved.This document may not be reproduced in whole or in part without the prior written permission of IBM.Note to U.S. Government Users — Documentation related to restricted rights — Use, duplication or disclosure is subject to restrictionsset forth in GSA ADP Schedule Contract with IBM Corp.

Trademarks

IBM® is a registered trademark of International Business Machines Corporation.

The following are trademarks of International Business Machines Corporation in the United States, or other countries, or both:

Other company, product, and service names may be trademarks or service marks of others.

DB2 DB2 Universal Database RACFz/OS 400

February 2002 Edition

Student NotebookV1.2.2

TOC
Contents
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

Course Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

Agenda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv

Unit 1. Relational Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1Unit Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2

1.1 Tables and Guidelines Relating to Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3Components of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-4Uniqueness of Rows and Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6Order of Rows and Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-8Linkage of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-9

Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-11Unit Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-13

Unit 2. Views and Results During Database Design . . . . . . . . . . . . . . . . . . . . . . 2-1Unit Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2

2.1 Data Views, Steps, and Results During Design . . . . . . . . . . . . . . . . . . . . . . . . . 2-3Data Views During Database Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4Conceptual View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6Storage View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8Logical View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10Design Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-12


Unit 3. Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1Unit Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2

3.1 Problem Statement for Application Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3Purpose and Responsibilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4Contents of Problem Statement (1 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6Sample Problem Statement (1 of 8) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8Sample Problem Statement (2 of 8) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9Sample Problem Statement (3 of 8) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-10Contents of Problem Statement (2 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-11Sample Problem Statement (4 of 8) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-13Sample Problem Statement (5 of 8) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14Contents of Problem Statement (3 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-16Sample Problem Statement (6 of 8) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-18Sample Problem Statement (7 of 8) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-19Sample Problem Statement (8 of 8) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-21


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2000, 2002 Contents iii

Student Notebook

Unit 4. Entity-Relationship Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1Unit Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-2

4.1 Entity Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5ER Model in Design Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-6Entity Types, Entity Instances, Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-8Properties of Entity Types and Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-10Representation of Entity Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-13Determining the Entity Types (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-15Determining the Entity Types (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-17Entity Types - A Piece of Advice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-19Entity Types for CAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-21

4.2 Relationship Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-23Relationship Types Between Entity Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-24Relationship Types in ER Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-26Relationship Instance Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-29Multiple Relationship Types for Entity Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-30Unary Relationship Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-32A Special Relationship Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-34Relationship Types - Generalized Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-35Relationship Type on Relationship Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-36Relationship Type Versus Attribute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-38Relationship Types for CAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-40Cardinalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-41Cardinalities (Example 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-44Cardinalities (Example 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-46Defining Attributes and Relationship Key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-48Relationship Key (Example 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-50Relationship Key (Example 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-51Cardinalities for CAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-53

4.3 Dependent Entity Types, Supertypes, and Subtypes . . . . . . . . . . . . . . . . . . . 4-55A First Correction of the CAB Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-56Dependent Entity Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-58Dependent Entity Types - Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-60Nondefining Attributes for Relationship Types . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-62Nondefining Attributes - Sample Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-65Attributes for a Sample Relationship Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-66Relationships on Owning Relationship Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-68Controlling Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-69Cascading Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-71Controlling for Relationship Type Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-73A Second Correction of the CAB Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-74Supertype and Subtypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-76Bundle Cardinalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-79An Alternate Maintenance Record Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-82ER Model for CAB Without Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-84

4.4 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-87Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-88


iv Relational DB Design © Copyright IBM Corp. 2000, 2002


TOC
Constraints in ER Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-90Constraints (Example 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-92Constraints (Example 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-94Constraints (Example 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-96Constraints (Example 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-98
4.5 Splitting and Combining Entity-Relationship Models . . . . . . . . . . . . . . . . . . . 4-101Subdivision of ER Model into Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-102Pilot View of ER Model for CAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-104Maintenance View of ER Model for CAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-106Building an Enterprise-Wide ER Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-108Problems During Consolidation of ER Models . . . . . . . . . . . . . . . . . . . . . . . . . . 4-110

Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-112Unit Summary (1 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-121Unit Summary (2 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-122Unit Summary (3 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-123

Unit 5. Data and Process I nventories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1Unit Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2

5.1 Data Inventory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3Data and Process Inventories in Design Process . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4Data Inventory - Purpose and Responsibilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6Contents of Data Inventory (1 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-9Sample Data Types (1 of 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-12Sample Data Types (2 of 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-14Sample Data Types (3 of 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-15Sample Data Types (4 of 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-16Contents of Data Inventory (2 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-17Contents of Data Inventory (3 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-20Sample Data Elements and Groups (1 of 7) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-22Sample Data Elements and Groups (2 of 7) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-24Sample Data Elements and Groups (3 of 7) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-25Sample Data Elements and Groups (4 of 7) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-26Sample Data Elements and Groups (5 of 7) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-27Sample Data Elements and Groups (6 of 7) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-28Sample Data Elements and Groups (7 of 7) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-30Methods for Establishing a Data Inventory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-31Survey of Departments of Expertise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-32Review of Existing Data and Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-34Coupling of Data and Process Inventories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-36

5.2 Process Inventory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-39Process Inventory - Purpose and Responsibilities . . . . . . . . . . . . . . . . . . . . . . . . 5-40Contents for a Business Process (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-42Contents for a Business Process (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-44Sample Business Process (1 of 5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-46Sample Business Process (2 of 5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-47Sample Business Process (3 of 5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-48A Walk Through the ER Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-50


© Copyright IBM Corp. 2000, 2002 Contents v

Student Notebook

Sample Business Process (4 of 5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-56Sample Business Process (5 of 5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-57 Process Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-58Process Decomposition for CAB (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-60Process Decomposition for CAB (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-62

Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-63Unit Summary (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-67Unit Summary (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-68

Unit 6. Tuple Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1Unit Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-2

6.1 Establishing Tuple Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3Tuple Types in Design Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-4Tuple Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-5Characteristics of Tuple Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-7Tuple Types for Entity Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-9Tuple Types for Relationship Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-11No Tuple Type for Relationship Type (1 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-13No Tuple Type for Relationship Type (2 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-14No Tuple Type for Relationship Type (3 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-16Required Tuple Types for CAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-17Documentation of Tuple Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-19Tuple Types With Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-21Some Sample Tuple Types for CAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-23A Special Consideration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-25

6.2 Normalization of Tuple Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-27Normalization - An Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-28First Normal Form - Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-30First Normal Form - Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-31First Normal Form - Instance Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-33First Normal Form - ER Model Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-34First Normal Form - 2nd Example (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-35First Normal Form - 2nd Example (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-37Second Normal Form - Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-39Second Normal Form - Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-41Third Normal Form - Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-43Third Normal Form - Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-45Third Normal Form - Instance Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-47Third Normal Form - ER Model Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-48Third Normal Form in Multiple Tuple Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-503rd NF in Multiple Tuple Types (Alternative 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-513rd NF in Multiple Tuple Types (Alternative 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-53Fourth Normal Form - Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-54Fourth Normal Form - Sample Tuple Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-56Fourth Normal Form - Instance Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-58Fourth Normal Form - Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-60

Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-62


vi Relational DB Design © Copyright IBM Corp. 2000, 2002


TOC
Unit Summary (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-66Unit Summary (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-67
Unit 7. From Tuple Types to Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1Unit Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2

7.1 Combining and Splitting Tuple Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-5Tables in Design Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-6Tables for Tuple Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-7Conversion of Tuple Types into Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-9Problems With One-to-One Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-11Merging Partial Tuple Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-13Finding Partial Tuple Types from ER Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-15Imbedding Detail Tuple Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-18Finding Detail Tuple Types from ER Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-20Decomposition of Super Tuple Types (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-23Decomposition of Super Tuple Types (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-25Combining Tuple Types - Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-26Limitations and Consequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-28Denormalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-30Vertical Splitting of Tuple Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-33Horizontal Splitting of Tuple Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-35

7.2 Physical Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-37Built-In Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-38Column Attributes - Nullable Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-41Nullable Columns and Cardinalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-43Column Attributes - Default Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-45Selection of Default Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-47Considerations for Abstract Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-49User Defined Distinct Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-51User Defined Distinct Types - Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-53User Defined Functions (UDFs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-55UDFs - Definition and Invocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-57Check Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-59Check Constraints - Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-61Triggers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-62Triggers - Some Additional Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-64A Sample Abstract Data Type - Name Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-66Setting Up the Abstract Data Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-67INSERT Triggers for Abstract Data Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-69UPDATE Triggers for Abstract Data Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-71Abstract Data Type - Inserting and Updating . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-72Abstract Data Type - Selecting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-74An Alternate Implementation (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-76An Alternate Implementation (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-77Token Translation Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-78Token Translation Tables - An Alternative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-79

7.3 Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-81


© Copyright IBM Corp. 2000, 2002 Contents vii

Student Notebook

Documenting User Defined Distinct Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-82Documenting User Defined Functions (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-83Documenting User Defined Functions (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-84Documenting Check Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-85Documenting Tables - Table Info (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-87Documenting Tables - Table Info (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-88Documenting Tables - Column Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-89Documenting Triggers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-90

Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-91Unit Summary (1 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-97Unit Summary (2 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-98Unit Summary (3 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-99

Unit 8. Integrity Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-1Unit Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-2

8.1 Referential Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-3Integrity Rules in Design Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-4Integrity - Areas of Concern and Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-5Referential Integrity - Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-7Referential Integrity - Insert Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-11Referential Integrity - Delete Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-13Referential Integrity - Update Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-16Delete Rules and ER Model (1 of 8) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-18Delete Rules and ER Model (2 of 8) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-20Delete Rules and ER Model (3 of 8) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-22Delete Rules and ER Model (4 of 8) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-24Delete Rules and ER Model (5 of 8) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-26Delete Rules and ER Model (6 of 8) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-28Delete Rules and ER Model (7 of 8) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-29Delete Rules and ER Model (8 of 8) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-31Delete Rules for an Imbed Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-32Delete Connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-34Referential Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-36Definition of Referential Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-38Referential Integrity - Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-39Maintenance View - Updated ER Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-41Referential Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-42Referential Structure - Constraint Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-46

8.2 Other Types of Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-47Domain Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-48Redundancy Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-50Violation of Normal Forms - Trigger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-53Derivable Data - Sample Triggers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-55Constraint Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-57Constraint Integrity - Example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-59Constraint Integrity - Example 2 (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-61Constraint Integrity - Example 2 (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-62


viii Relational DB Design © Copyright IBM Corp. 2000, 2002


TOC
Constraint Integrity - Example 3 (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-64Constraint Integrity - Example 3 (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-66

Unit 9. Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1Unit Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-2

9.1 Structure, Options, and Usage of Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-3Indexes in Design Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-4Purpose of an Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5Structure of Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-7Searching Via an Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-9Unique and Nonunique Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-11Clustering Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-13Clustering Index - First Insertion (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-15Clustering Index - First Insertion (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-16Clustering Index - Second Insertion (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-17Clustering Index - Second Insertion (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-18Partitioning Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-19Use of Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-21No Index for Leading Foreign Key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-23Indexes - Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-25


Unit 10. Logical Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-1Unit Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-2

10.1 Logical Data Structures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-3Logical Data Structures in Design Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-4Logical Data Structures - Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-5Logical Data Structures - Responsibilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-7Sample Business Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-9Sample Structure Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-11Sample Path and Table Summaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-17An Alternate Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-18Processes and Logical Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-20Example 2 - Business Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-22Example 2 - Structure Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-23Example 2 - Path and Table Summaries (1 of 3) . . . . . . . . . . . . . . . . . . . . . . . . 10-26Example 2 - Path and Table Summaries (2 of 3) . . . . . . . . . . . . . . . . . . . . . . . . 10-27Example 2 - Path and Table Summaries (3 of 3) . . . . . . . . . . . . . . . . . . . . . . . . 10-28Characteristics of Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-29Usage of Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-31

Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-33Unit Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-35


© Copyright IBM Corp. 2000, 2002 Contents ix

Student Notebook

Appendix A. Sample Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1Business Object Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1Business Relationship Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-3Business Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-5

Appendix B. Checkpoint Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X-1


x Relational DB Design © Co pyright IBM Corp. 2000, 2002


TMK
Trademarks
The reader should recognize that the following terms, which appear in the content of this training document, are official trademarks of IBM or other companies:

IBM® is a registered trademark of International Business Machines Corporation.

The following are trademarks of International Business Machines Corporation in the United States, or other countries, or both:

Other company, product, and service names may be trademarks or service marks of others.

DB2 DB2 Universal Database RACFz/OS 400


© Copyright IBM Corp. 2000, 2002 Trademarks xi

Student Notebook


xii Relational DB Design © Copyright IBM Corp. 2000, 2002


pref
Course Description
Relational Database Design

Duration : 4.0 days

Purpose

This course presents a methodology for modeling and designing relational databases.

Audience

People responsible for designing relational databases and people who need an in-depth understanding of data modeling.

Prerequisites

The course does not require any special prerequisites.

Objectives

After completing this course, you should be able to:

• Design relational databases.

• Consider logical and physical aspects including integrity requirements during the design.

Contents

This course covers the following major topics:

• Relational concepts • Views and results during database design • Problem statement • Entity-relationship modeling • Data and process inventories • Tuple types • From tuple types to tables • Integrity rules • Indexes • Logical data structures and views


© Copyright IBM Corp. 2000, 2002 Course Description xiii

Student Notebook


xiv Relational DB Design © Copyright IBM Corp. 2000, 2002


pref
Agenda
Day 1

Relational Concepts Views and Results During Design Problem Statement Exercises: Problem Statement Review Exercises: Problem Statement Entity-Relationship Model (Part 1)

Day 2

Entity-Relationship Model (Part 2) Exercises: ER Model Review Exercises: ER Model Data and Process Inventories Exercises: Data and Process Inventories

Day 3

Review Exercises: Data and Process Inventories Tuple Types Exercises: Tuple Types Review Exercises: Tuple Types From Tuple Types to Tables Exercises: From Tuple Types to Tables

Day 4

Review Exercises: From Tuple Types to Tables Integrity Rules Exercises: Integrity Rules Review Exercises: Integrity Rules Indexes Exercises: Indexes Review Exercises: Indexes Logical Data Structures


© Copyright IBM Corp. 2000, 2002 Agenda xv

Student Notebook


xvi Relational DB Design © Copyright IBM Corp. 2000, 2002

Student NotebookV1.2.2 BKM2MIF

Uempty
Unit 1. Relational Concepts
What This Unit Is About

This unit describes relational concepts important for designing relational databases.

What You Should Be Able to Do

After completing this unit, you should be able to:

• Identify the components of tables.

• Explain the rules defined by the relational data model regarding:

- The uniqueness of rows and columns - The physical ordering of rows and columns - The linkage of tables

How You Will Check Your Progress

Accountability:

• Checkpoint questions


© Copyright IBM Corp. 2000, 2002 Un it 1. Relational Concepts 1-1

Student Notebook

Figure 1-1. Unit Objectives CF182.0

Notes:

The relational data model describes the conceptual representation of the data objects of relational databases and gives guidelines for their implementation. In this unit, we will discuss the main relational data object, the table, and some of the guidelines applicable to the implementation of tables.

Conceptually, all data in relational databases is stored in tables. Also, when data is presented to a user externally, it has the appearance of a table. Tables consist of rows and columns as we will discuss in this unit.

In this unit, we will also discuss guidelines of the relational data model pertaining to the uniqueness of rows and columns, the physical ordering of rows and columns, i.e., the stored sequence of rows and columns, and the linkage of tables. The discussions will emphasize the implications of these guidelines for the design and processing of tables.

Unit Objectives

After completion of this unit, you should be able to:

Identify the components of tables

Explain the guidelines defined by the relational data model pertaining to:

The uniqueness of rows and columns

The physical ordering of rows and columns

The linkage of tables


1-2 Relational DB Design © Copyright IBM Corp. 2000, 2002


Uempty
1.1 Tables and Guidelines Relating to Tables


Student Notebook

Figure 1-2. Components of Tables CF182.0

Notes:

Tables are the main data object described by the relational data model. Conceptually, all data of relational databases is stored into tables. Also, all data returned to a user is presented in form of a table.

Structurally, as with tables in books or newspapers, a table is subdivided in rows and columns. Horizontally, a table is subdivided in rows. The data stored into a row is logically related and belongs to a single object, such as a person or an aircraft model. Conversely, the data for a single object is stored into a single row.

You can compare rows to records in flat files for regular access methods or to segments in hierarchical databases. From the access method's point of view, records are unstructured. In contrast, from the database management system's perspective, rows are structured. Their structure is determined by the columns of the table.

Vertically, a table is subdivided into columns. All data stored into a column has the same semantical meaning and is of the same type. Columns have names. You can define the name of a column and should choose it in such a way that it expresses the semantical meaning of the column.

Components of Tables

TYPE MODEL CATEGORY MANUFACTURER ENGINES

A340 100 JET AIRBUS 4

B737 500 JET BOEING 2




AIRCRAFT_MODEL

ROW

COLUMN

FIELD

BOEING

VALUE




Uempty
The columns of a table subdivide the rows of the table into fields. The fields are the actual receptacles for the data stored into a table. All rows of a table are subdivided in the same manner, i.e., have the same columns in the same order.
A field may or may not contain data. The data in a field is also referred to as the value of the field or the value of the column for the appropriate row. From the relational database management system's point of view, the data in a field is atomic and unstructured. This means that, from the relational database management system's point of view, a field contains a single value. This does not preclude that the relational database management system may offer (column) functions allowing you to further manipulate the data of a column.



Student Notebook

Figure 1-3. Uniqueness of Rows and Columns CF182.0

Notes:

In contrast to records that are always retrieved in their entirety, you need not retrieve all columns of the rows. You can select particular columns by providing their names. You can also only change selected columns of the rows of a table. For this reason, the relational data model requires that all column names of a table be unique. Thus, in the example on the visual, you cannot have four columns with the name ENGINE. If you need all four columns, you must name them differently. In the example, the four columns have been renamed to ENGINE_1, ENGINE_2, ENGINE_3, and ENGINE_4.

In many cases, if you have naming conflicts for columns of a table, the semantics of the conflicting columns has not been defined sufficiently. By better defining the meaning of the columns, you may find different, more meaningful, names for the columns as is the case for the illustrated example. (The engines of an aircraft are generally referred to as Engine 1, Engine 2, and so on.)

From a design point of view, the illustrated solution may not even be the desirable solution. What happens, for example, if new aircraft models are introduced whose aircraft have more than four engines? This will be discussed later in the course.

Uniqueness of Rows and Columns

AIRCRAFTSERIAL_NUMBER ACQUIRED ENGINE ENGINE ENGINE ENGINE

B238725737 1994-07-21 P0102313 P0102314

B238768737 1997-05-12 R0942497 R0942498

B167029747 1992-10-20 G0015237 G0015240 G0025635 G0025678

A11599320 1994-02-19 R0307023 R0307025

A11599320 1994-02-19 R0307023 R0307025

A203623340 1996-08-01 R0346723 R0346724 R0346743 R0346744

Column names must be unique

ENGINE_1 ENGINE_2 ENGINE_3 ENGINE_4

Rows should be unique




Uempty
In the same way, as you can retrieve or update selective columns of a table, you can retrieve, update, or delete specific rows of a table. To ensure that you can do this, the relational data model recommends that all rows be unique, i.e., that no two rows contain the exact same data. Many relational database management systems do not enforce this rule, but there are some which do. Therefore, if your database design is to be system-independent, you should make sure that the rows of your tables are unique. We will see later in the course how you can achieve this.
There are also other design considerations that make it highly recommendable to ensure that all rows of a table are unique. A design should not just be short-lived, it should be something lasting. At this moment, duplicate rows in a table may be fine because you might not intend to retrieve, update, or delete rows individually. However, your perception may change as new applications are introduced.

Ask yourself why you may want to have multiple identical rows? If you only need them to determine how often the event creating the rows occurred, you might be better off to add a column to the table counting the occurrences and remove the duplicates. This may reduce the space required for your table and improve performance.

The design methodology taught in this course will insist that rows in the resulting tables are unique.

DB2 allows duplicate rows in tables, but can automatically ensure that no two rows are alike in a table.



Student Notebook

Figure 1-4. Order of Rows and Columns CF182.0

Notes:

According to the relational data model, the sequence in which the rows and columns of a table are physically stored in a relational database is completely up to the relational database management system. Conversely, the physical sequence of the rows and columns does not imply the sequence in which the rows or columns are returned if an ordering has not been requested by the end user or application. As a matter of fact, the same (unordered) retrieval request issued twice may return the rows and columns in a different sequence the second time it is issued.

This means that an application cannot rely on the physical sequence of the rows or columns in the database. If the order of the rows or columns is important to the application during retrieval, it must tell the relational database management system how the returned rows should be ordered. The ordering of the rows can be based on the values of one or more columns of the table; the order can be by ascending or descending column values. The order that can be requested is always a logical order and never a physical order.

The application can define the order in which the columns are to be returned by specifying the column names in the desired sequence in the retrieval request.

Order of Rows and Columns

Next time, rows may be returned in a different sequence

Next time, columns may be returned in a different sequence

Unordered Retrieval

TYPE MODEL CATEGORY MANUFACTURER ENGINES






First time

TYPE MODEL CATEGORY ENGINES MANUFACTURER

A320 200 JET 2 AIRBUS

A340 100 JET 4 AIRBUS

B737 500 JET 2 BOEING



Next time




Uempty

Figure 1-5. Linkage of Tables CF182.0

Notes:

A table is seldom on its own meaning that a relational database normally consists of multiple tables which are logically interconnected.

The visual shows two tables, AIRCRAFT_MODEL and MANUFACTURER. There is clearly an interconnection between the two tables. Each row of table AIRCRAFT_MODEL contains an identifier for the manufacturer of the corresponding aircraft model, but does not give any details for the manufacturer. The details for the manufacturer are contained in table MANUFACTURER. Logically, each row of table AIRCRAFT_MODEL with a specific manufacturer-id is interconnected with the row of table MANUFACTURER having the same manufacturer-id.

The relational data model prescribes that logical associations are not physically implemented in the relational database and that they are dynamically established, by means of Join operations, on a request-by-request basis. In particular, there are no physical pointers, such as addresses, in the columns referring to rows of other tables. The request-based joining of tables is accomplished by means of the values of the columns named in the join operation.

Linkage of Tables

AIRCRAFT_MODELTYPE MODEL CATEGORY MANUFACTURER ENGINES






MANUFACTURERMID NAME CITY

AIRBUS AIRBUS INDUSTRIES TOULOUSE

BOEING BOEING CORPORATION SEATTLE



Student Notebook

In the example of the visual, the joining of the rows for aircraft models A340, Model 100, and A320, Model 200, in table AIRCRAFT_MODEL with the proper manufacturer is achieved by having the same value (AIRBUS) in columns MANUFACTURER and MID, respectively. Of course, the columns must be specified in the request performing the join operation.

Similarly, all rows for aircraft models having the value BOEING in the MANUFACTURER column are joined with the appropriate row of table MANUFACTURER.

The important point is that, during relational database design, you need not worry about physical pointers. However, you will have to worry about logical relationships which are realized through column values rather than pointers. Column values are not affected by reorganizations, physical pointers may be affected.

The relational data model disallows externally visible pointers, but does not prohibit internal pointers (e.g., in index entries) that are not externally visible.




Uempty
Checkpoint
Exercise — Unit Checkpoint

1. What is the purpose of the relational data model?

_____________________________________________________

_____________________________________________________

_____________________________________________________

2. Which one of the following choices is the main data object of relational databases?

a. Column.

b. Row.

c. Table.

d. Field.

3. A table is horizontally structured into columns and vertically into rows? (T/F)

4. What is a field of a table?

_____________________________________________________

_____________________________________________________

_____________________________________________________

5. A field may or may not contain a value? (T/F)

6. All rows of a table have the same structure (columns)? (T/F)

7. Which of the following statements are true?

a. All columns of a table must have a name.

b. The names of the columns of a table must be unique.

c. The rows of a table must be unique.

d. The rows of a table should be unique.



Student Notebook

8. Why should all rows of a table be different?

_____________________________________________________

_____________________________________________________

_____________________________________________________

9. If a logical order of the rows is not requested, the rows of a table are always made available in the sequence they are physically stored in. (T/F)

10.For an unordered retrieval request, different executions of the request may return the retrieved rows in a different sequence. (T/F)


a. Tables in a relational database are interconnected by means of pointers.

b. Even internally, relational database management systems do not use pointers.

c. Tables are joined based on column values.




Uempty

Figure 1-6. Unit Summary CF182.0

Notes:

Unit Summary

In relational databases, all data is stored in tables

Tables are structured in rows and columns

Horizontally, into rows

Vertically, into columns

Fields may or may not contain data

The data in fields is considered atomic

The column names of a table must be unique

All rows of a table should be different

Unless you request a specific order, you cannot assume that the rows returned are in an order

Unless you request a specific order, you cannot assume that the columns returned are in an order



Student Notebook




Uempty
Unit 2. Views and Resu lts During Database Design

This unit describes the different views assumed for the data of the application domain during relational database design. It also outlines the steps performed during design and their results.



• Explain the different views assumed for the data during database design:

- The conceptual view - The storage view - The logical view

• Summarize the steps performed during database design and their results.

• Relate the steps and results to the data views.


Accountability:



© Copyright IBM Corp. 2000, 2 002 Unit 2. Views and Result s During Database Design 2-1

Student Notebook


Notes:

When designing a relational database, different views are assumed for the data of the subject application domain. These views are:

• The conceptual view • The storage view • The logical view

During this unit, we will discuss these data views, give an overview of the steps performed during database design, list their results, and relate the results of the steps to the data views.

Unit Objectives

Relate the results of the steps to the data views


Explain the different views assumed for the data during database design:

The Conceptual View

The Storage View

The Logical View

Summarize the steps performed during database design and their results




Uempty
2.1 Data Views, Steps, and Results During Design


Student Notebook

Figure 2-2. Data Views During Database Design CF182.0

Notes:

When designing the database for an application domain, you start with a problem statement for the application domain, i.e., a document describing the types of business objects for the application domain, the relationships between them, and the business constraints for both of them. The problem statement must be established by an application domain expert (analyst). In general, it is not produced by the database designer who does not have the domain expertise, but it is input for him/her.

Starting with the problem statement, a series of steps is performed during the design. These steps look at the data of the application domain from three different angles, called views:

• The conceptual view • The storage view • The logical view

For each of these views, a set of results is produced during database design. You can associate a view with its results and describe it by its results. For this reason, it is quite common to say "the ... view consists of ..." rather than "the ... view establishes ...". In the

Data Views During Database Design

Entity RelationshipModel

Data Inventory

Tuple Types

Tables

Indexes

Process Inventory

Logical Data Structures

Integrity Rules

Problem Statement

ConceptualView

Logical View StorageView




Uempty
latter case, the view is seen more as the activity of looking at the application domain from a specific angle and producing certain results whereas, in the former case, the view is seen as the results produced. During this course, both terminologies are used.
The conceptual view scrutinizes and structures the data of the application domain based on their semantical meaning, i.e., their meaning for the business (application domain). It does this independently of the business processes accessing the data and without regard to any existing or planned method for storing the data.

Thus, during the conceptual view, the process- and implem entation-independent architecture of the data of the application domain is established.

The storage view looks at the data of the application domain from a storage point of view. During the storage view, in a series of steps, the objects of the conceptual view are mapped into objects (in particular, tables) of the relational database management system chosen for the implementation of the data. Thus, the storage view is not an implementation-independent view of the data. Rather, it is an implementation-oriented view of the data during which the conceptual view is physically implemented in the selected relational database management system.

The logical view looks at the data of the application domain from a process point of view. Generally, a particular business process does not access all data of the application domain, but only a part of the data. Thus, it has its own process-dependent view of the data of the application domain. Accordingly, during the logical view, the process-dependent views for the business processes of the application domain are established.



Student Notebook

Figure 2-3. Conceptual View CF182.0

Notes:

As mentioned before, during the conceptual view, the process- and implementation-independent architecture of the data of the application domain is established. The application domain, described by the problem statement, is scrutinized for its business object types, the relationships between them, and the business constraints.

As a result of this scrutiny, an entity-relationship model is established visualizing and structuring the business object types of the application domain as entity types; illustrating the relationships between the business object types by means of relationship types; and modeling the constraints for the entity types and relationship types imposed by the business constraints.

In a second step, which is not directly performed by the database designer, but requires his/her participation, the elementary data of the application domain, referred to as data elements, are identified and described in detail. The descriptions are recorded in a document, the data inventory.

As the data elements are collected, they are assigned to the business object types to which they belong. More precisely, they are assigned to the corresponding entity types verifying

Conceptual View

ConceptualView

Tuple Types

Tables

Indexes

Process Inventory


Integrity Rules



Data Inventory

Problem Statement




Uempty
whether or not the entity-relationship model established during the previous step is complete.


Student Notebook

Figure 2-4. Storage View CF182.0

Notes:

During storage view, the conceptual view is physically implemented in a relational database management system. More precisely, the results of the conceptual view are implemented.

The steps executed during storage view transform the results of the conceptual view into tables and related objects of the chosen relational database management system. The initial steps are mostly independent of the chosen relational database management system. The further you proceed, the more system-dependent aspects have to be considered although many of the considerations are of a global nature.

The first step of the storage view uses the data inventory to construct tuple types for the entity types and relationship types of the entity-relationship model developed during the conceptual view and normalizes them. Tuple types are the precursors of tables and provide the basis for the computerized processing of the entity types and relationship types for the application domain. During the normalization of the tuple types, data redundancies and abnormalities are resolved that may lead to data inconsistencies if not removed.

Storage View

ConceptualView

Logical View

Problem Statement

Process Inventory



Data Inventory

StorageView

Tuple Types

Tables

Indexes

Integrity Rules




Uempty
Based on a prescribed set of rules, the next step of the storage view converts the tuple types into tables of the chosen relational database management system taking into account the supported functions and features.
Also as part of storage view, any rules concerning the integrity of the data, including the constraints defined as part of the entity-relationship model, must be converted into rules for the tables created by the previous step and implemented if the chosen relational database management system provides the necessary functions such as check constraints, referential constraints, and triggers.

Some of the associations between the tables (especially, those implied by referential constraints) make it imperative that indexes be defined for certain columns of the tables. The last step of storage view will define these indexes.



Student Notebook

Figure 2-5. Logical View CF182.0

Notes:

The logical view looks at the data of the application domain from the perspective of the processes for the application domain. During the logical view, the required process-specific views for the processes of the application domain are established.

The first step during the logical view for an application domain describes, in an implementation- and database-independent fashion, the processes retrieving and/or manipulating the data of the application domain. The process descriptions are collected in a document referred to as process inventory. For each process, they must identify the data elements used by the process.

After the tables for the application domain and the integrity rules for them have been defined, as part of logical view, the necessary logical data structures are established for all processes described in the process inventory. Each logical data structure describes a view that a process (or part of a process) has of the tables defined during storage view. More precisely, the logical data structure describes the subset of the tables for the application domain required by the process or a part of the process. It also illustrates how the process

Logical View

ConceptualView

StorageView

Problem Statement


Tuple Types

IndexesLogical View

Process Inventory


Data Inventory

Tables

Integrity Rules




Uempty
or the part of the process must logically navigate through the tables to achieve its function. Thus, it reflects the tables needed, the subsets required, and the flow between the tables.
As you can see by now, the steps of the various views are interconnected and may be dependent on each other. The process inventory of the logical view is the primary source for the data inventory since it identifies the data elements used by the processes. Similarly, the tables and the integrity rules of the storage view are required input for the logical data structures.



Student Notebook

Figure 2-6. Design Methodology CF182.0

Notes:

This visual illustrates the complete design methodology used during the course. The entity-relationship model developed during the conceptual view is used and updated by the later steps of the design as additional knowledge becomes available. This ensures that the model remains valid and useful at all times.

The methodology described on the previous pages and illustrated by the diagram on this visual is not a pure top-down approach. Design is and must be an iterative process. When you start with the design, your knowledge of the application domain is most likely incomplete even if the problem statement was prepared carefully. No matter how thoroughly you execute the various steps, subsequent steps will detect holes and errors in the results of the preceding steps that will force you to revisit these steps.

Unless the problem statement is incomplete, you should always start an iteration with the entity-relationship model. Check if the required change impacts the entity-relationship model. If it does not, proceed to the next step and verify its results.

If the problem detected reveals that the problem statement is incomplete or incorrect, have it extended or corrected by the application domain expert. It is not your, i.e., the database

Design Methodology

ConceptualView

StorageView

Logical View

Problem Statement


Tuple Types

Indexes

Process Inventory


Data Inventory

Tables

Integrity Rules




Uempty
designer's, responsibility to change the problem statement. This must be done by a person with the proper domain competence. However, it is your responsibility to make the application domain expert aware of the problem. After the problem statement has been corrected, continue the iteration with the entity-relationship model as before.
The fact that relational database design is an iterative process should not make you sloppy. The better the problem statement and the more carefully the various steps are performed, the better your design will be. However, it does not make sense to dwell endlessly on a specific step.



Student Notebook

Checkpoint


1. What is the problem statement?

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

2. Who normally establishes the problem statement for an application domain?

a. The database designer.

b. The database administrator.

c. An application domain expert.

d. An application programmer.

e. The end users.

3. Match the following terms with the corresponding data views:

4. During the conceptual view, the data of the application domain are structured taking into account the business processes for the application domain. (T/F)

5. The logical view looks at the data of the application domain from the viewpoint of the business processes for the application domain. (T/F)

a. Physical implementation ____ Conceptual view

b. Process-dependent views ____ Storage view

c. Independent data architecture ____ Logical view




Uempty
6. Match the three data views with the results produced by them:
7. What is the purpose of an entity-relationship model?

_____________________________________________________

_____________________________________________________

_____________________________________________________

8. What is the data inventory?

_____________________________________________________

_____________________________________________________


a. Tuple types are the precursors of tables.

b. Tuple types provide the basis for the computerized processing of the business objects and of the relationships between them.

10.What is the purpose of a logical data structure?

_____________________________________________________

_____________________________________________________

_____________________________________________________

11. The design methodology taught in this unit is a waterfall approach, i.e., the various steps of the methodology are processed from top to bottom and are not reiterated. (T/F)

a. Conceptual view ____ Tuple types

b. Storage View ____ Entity-relationship model

c. Logical view ____ Data inventory

____ Process inventory

____ Tables

____ Integrity rules

____ Logical data structures

____ Indexes



Student Notebook


Notes:

Unit Summary

The data views taken during database design are:

The conceptual viewThe storage viewThe logical view

The storage view looks at the data from a physical (storage) point of view

Physical implementation of conceptual view

The logical view looks at the data from the process point of view

Collection of process-dependent views

The conceptual view looks at the data from the application domain perspective

Process- and implementation-independent architecture of data of application domain




Uempty
Unit 3. Problem Statement

This unit describes the purpose and contents of the problem statement for an application domain. It discusses the responsibility for the problem statement and the role of the database designer in its creation.



• Explain the purpose of the problem statement for database design.

• Understand who has the responsibility for the creation of the problem statement.

• Describe the role of the database designer in the creation of the problem statement.

• Describe the contents of a problem statement.


Accountability:

• Checkpoint questions • Exercises


© Copyright IBM Corp. 2000, 2002 Unit 3. Problem Statement 3-1

Student Notebook


Notes:

When designing the database for an application domain, you start with a problem statement for the application domain. This unit discusses the problem statement in detail and describes:

• The purpose of the problem statement

• Who is responsible for the creation of the problem statement

• The role of the database designer in the creation of the problem statement

• The contents the problem statement should have to be a usable input for database design.

Unit Objectives

Describe the contents of a problem statement


Explain the purpose of the problem statement for database design

Describe the role of the database designer in the creation of the problem statement

Understand who has the responsibility for the creation of the problem statement




Uempty
3.1 Problem Statement for Application Domain


Student Notebook

Figure 3-2. Purpose and Responsibilities CF182.0

Notes:

The problem statement must be created by someone who has detailed knowledge of the application domain, i.e., an application domain expert. Only then will the problem statement reflect the application domain correctly and completely.

The problem statement is input for the database designer and is a global description of the application domain for which the database designer is to develop a database. It is a global description rather than a detailed description. This means it describes the important characteristics of the application domain rather than the various data elements and processes of the application domain or any implementation-dependent details. It should be a functional description of the application domain. It should not describe the current or a planned implementation. It must allow the database designer to:

• Gain a basic understanding of the application domain so that he/she can comprehend the context of business objects, business relationships, and business constraints important for the design; detect inconsistencies; and discuss problems detected during the design with sufficient ease and knowledge with the application domain expert or the responsible department.

Purpose and Responsibilities

Create an entity-relationship model for the application domain

Global description of application domain

Must allow the database designer to:

Gain a basic understanding of the application domain

Does not contain detailed information

Input for database designer

Created by application domain expert

Database designershould work with

application domain expert




Uempty
• Create an entity-relationship model for the application domain visualizing and clarifying the types of business objects and business relationships and the business constraints to be implemented in the database.
As mentioned before, the problem statement should not contain detailed information about the application domain. The detailed information about the application domain is provided by the data inventory and the process inventory discussed later.

Although the application domain expert is responsible for the problem statement, the database designer should work with the application domain expert during the creation of the problem statement. The database designer knows best which input he/she needs for the design of the appropriate database and, thus, can provide the necessary guidance to the application domain expert.

Furthermore, by working with the application domain expert, the database designer will gain a better understanding of the application domain easing his/her work considerably.



Student Notebook

Figure 3-3. Contents of Problem Statement (1 of 3) CF182.0

Notes:

First, the problem statement should contain a textual description, an overview, of the application domain. This overview should describe, in simple words, what the application domain does so that the database designer gets at least a certain idea of what is going on.

Furthermore, the overview should indicate what the application domain (or more precisely, the appropriate departments) wants to achieve by using a database management system. In particular, the overview should point out which areas of the application domain should be implemented in the target database. The actual application domain may be much larger, and it may not be intended or possible to implement the entire domain in the database.

Secondly, the problem statement should list and describe all categories (types) of business objects which are important for the application domain and about which information is to be stored in the target database. These categories are referred to as business object types.

As a category, a business object type represents all business objects having the same meaning and characteristics rather than distinct business objects. For an airline company, for example, the problem statement should describe that information about aircraft models

Contents of Problem Statement (1 of 3)

A short textual description (overview ) of the application domain

What the application domain does

What the application domain wants to achieve by means of the database management system

A listing of all business object types about which information is to be stored including:

How the individual objects of the business object type can be identified

A textual description of the business object types

No details yet




Uempty
in general is to be stored rather than about a specific aircraft model such as a Boeing 747, Model 400.
For each business object type to be implemented in the target database, the following information should be provided:

• A textual description of the semantics of the business object type without going into details such as the individual attributes of the business object type. The details for the business object types will be provided via the data inventory discussed later.

• How the distinct objects belonging to the business object type can be identified.



Student Notebook

Figure 3-4. Sample Problem Statement (1 of 8) CF182.0

Notes:

Throughout this course, we will use a sample application domain to demonstrate the various items discussed. This application domain comprises the flight planning, pilot assignment, and aircraft maintenance for an airline called Come Aboard or, in short, CAB.

This visual illustrates the overview section of the problem statement for our sample application domain. The amount of information provided in the overview depends on the general familiarity of the application domain. If the application domain is less known or more complex, the overview will require more information. The sample application domain used in this course is generally well-known and is not really complex. Consequently, the short description on the visual should be sufficient.

You should note that the second paragraph limits the application domain being considered to the fight planning, the pilot assignment, and the aircraft maintenance. Without this restriction, the application domain for the management of an airline would comprise additional areas such as flight reservation or seat selection.

The entire problem statement for the sample application domain can be found in Appendix A - Sample Problem Statement.

Sample Problem Statement (1 of 8)

Come ABoard (CAB) is an airline servicing a set of airports with its aircraft. As employees, it has pilots flying the aircraft, mechanics maintaining and servicing the aircraft, and other personnel for various service functions.

CAB wants to administer flight planning, pilot assignment, and aircraft maintenance activities by means of a database management system.

Overview




Uempty


Notes:

This visual illustrates a business object type for our sample airline company called CAB.

Aircraft Models is a business object type for the application domain being considered since flight planning, pilot assignment, and aircraft maintenance are dealing with aircraft models. When a flight is planned and an aircraft is assigned to the flight, that aircraft cannot be an arbitrary aircraft. It must be an aircraft of a specific aircraft model because the model is published in the timetables and the starting and landing airports require the aircraft to be of a certain model.

Similarly, pilots are only allowed to fly aircraft of those models they have a license for, and mechanics may only service aircraft of models they have been trained for.

Aircraft Models is a business object type rather than a business object because it represents a set of objects with the same meaning (being models of aircraft) and the same characteristics such as manufacturer, category (jet or turboprop), or number of engines.

As highlighted on the visual, the individual aircraft can uniquely be identified by their type code (e.g., B737) together with their model number (e.g., 500).


Business Object Types

CAB wants to store information about the following business object types in its database:

AIRCRAFT MODELS

For its flying activities, CAB uses aircraft of different types or, more precisely, models such as Boeing 737, Model 500, or Airbus A320, Model 200. For the aircraft models it owns or has on order, CAB wants to maintain information in its database such as their category (e.g., JET or TURBOPROP), length, height, wing span, or number of engines.

unique identifier

The aircraft models can be uniquely identified by their type code (e.g., B737) together with their model number (e.g., 500).



Student Notebook


Notes:

This visual illustrates two further business object types for our sample application domain: Aircraft and Airports. Again, both of these types are of interest for the application domain and represent true categories. Each of them represents a set of objects having the same meaning and the same characteristics.

As highlighted on the visual, the various aircraft of the business object type Aircraft can be identified by means of a unique serial number. Even for aircraft of different models, this serial number is unique, i.e., no two aircraft can have the same serial number.

For airports, their international airport codes (e.g., SFO for San Francisco, CA, JFK for John F. Kennedy Airport in New York, NY, or STR for Stuttgart, Germany) serve as unique identifiers.

The full set of business object types for the CAB application domain can be found in Appendix A - Sample Problem Statement.


unique identifier

AIRCRAFT

CAB owns multiple aircraft of the various aircraft models. For the aircrafts it owns, CAB wants to maintain information such as the date when the aircraft was acquired, the engines mounted on the aircraft, or the seats of the aircraft.

Each aircraft has a unique serial number. This serial number is unique across aircraft models.

AIRPORTS

CAB services a set of airports with its aircraft. For these airports as well as for airports CAB plans to service in the near future, CAB wants to keep information in its database such as the airport code, the location of the airport, the address of CAB's city ticketing office, or the address of CAB's airport office.

The airport codes uniquely identify the various airports.




Uempty


Notes:

As a third item, the problem statement should contain a listing of all types (categories) of logical relationships that exist between business objects of the various business object types. A business relationship logically interconnects two or more business objects which may belong to different or to the same business object type. The objects may even be identical, i.e., an object may have a relationship with itself.

Business relationships of the same type always interconnect business objects of the same respective types. For example, if r1 and r2 are business relationships of the same type and r1 associates an object of business object type O1 with an object of business object type O2, then r2 must also interconnect an object of O1 with an object of O2.

Note that we are talking about types or categories of relationships, referred to as business relationship types, rather than individual relationships between business objects. For the problem statement, it is not important which specific business objects have a relationship with each other. It is only important to identify the type of the business relationship and to understand its semantics and characteristics. For the sample airline application domain, for example, it is only important to know that aircraft belong to aircraft models and that an


If the type of relationship requires an object to have at least one relationship

If the objects having a relationship with an object must be deleted when the other object is deleted

How many relationships of the same type an object can have

If the object may have many or at most one relationship of that type

A listing of all types of relationships between the business objects including:

A textual description of the type of relationship

The objects can belong to the same or different object types

A business relationship always concerns at least two business objects

All relationships of the same type have the same meaning and concern objects of the same business object types

If a first object has a relationship with a second object, the second object also has a relationship with the first object



Student Notebook

individual aircraft always belongs to one and only one model. For the problem statement, it is not important to know that the aircraft with serial number B238725737 is a Boeing 737, Model 500.

Depending on the business object type from which you look at the business relationship type, there are different (directional) views of the same business relationship type. For the above airline example, you may look at the business relationship type from Aircraft's point of view or from Aircraft Models' point of view. From Aircraft's point of view, the semantics is that an aircraft belongs to an aircraft model; from Aircraft Models' point of view, the meaning is that a specific aircraft model comprises an aircraft. As expected, the meanings are complementary. You can think of them as separate directional business relationship types that make up a single nondirectional (or bidirectional) business relationship type.

For each business relationship type, the problem statement should include:

• A textual description of the business relationship type, i.e., describe its meaning and the business object types involved.

• How many relationships of the same type an object can have. The important fact is whether it can have many relationships or at most one.

• If the type of the relationship requires every existing object of an object type to have at least one relationship of the considered business relationship type.

• If the objects having a relationship of the considered type with an object must be deleted as well if that object is deleted, i.e., the consequences of delete operations on objects that are interconnected by means of relationships.

It is possible that the objects of two business object types are interconnected by multiple (different) business relationship types.




Uempty


Notes:

This visual illustrates a business relationship type for our sample application domain. As mentioned before, there exists a business relationship type linking objects of business object type Aircraft Models to objects of business object type Aircraft.

From Aircraft Models' point of view, the meaning of the business relationship type is that an aircraft model comprises an aircraft. Conversely, from Aircraft's point of view, the meaning is that an aircraft belongs to an aircraft model.

As highlighted on the visual, there may be many aircraft for an aircraft model, but it is also possible that an aircraft model does not have any aircraft.

A given aircraft, in contrast, can only belong to a single aircraft model. Furthermore, an aircraft must always belong to an aircraft model. Accordingly, every (existing) aircraft must have a relationship to an aircraft model. Thus, from Aircraft's point of view, the business relationship type is a mandatory business relationship type. From Aircraft Models' point of view, the business relationship type is not mandatory because an aircraft model need not have a relationship to an aircraft.


Mandatory relationship type

Business Relationship Types

1 aircraft ~ 1 to 1 aircraft model

The following types of relationships exist between the business objects which CAB wants to implement in its database:

1 aircraft model ~ 0 to many aircraft

AIRCRAFT MODELS - AIRCRAFT

For an aircraft model, CAB may have any number of aircraft. In particular, it is possible that there are no aircraft (yet) for an aircraft model. Conversely, an aircraft belongs to one and only one aircraft model.



Student Notebook


Notes:

This visual illustrates another business relationship type for CAB. This business relationship type interrelates aircraft and maintenance records: The objects of business object type Aircraft (may) have relationships with objects of business relationship type Maintenance Records (Aircraft Has Maintenance Record). Conversely, each object of Maintenance Records must have a relationship to one and only one object of Aircraft (Maintenance Record for Aircraft).

As the description states, all maintenance records for an aircraft are to be deleted when the aircraft is deleted. From Aircraft's point of view, the business relationship type is a cascading business relationship type because delete operations are cascaded down to the associated objects of the other business object type.

The above description of the business relationship type does not match with the description in Appendix A - Sample Problem Statement since we want to illustrate a cascading business relationship type. The description in Appendix A - Sample Problem Statement has some peculiarities which will be discussed later.


1 maintenance record ~ 1 to 1 aircraft

1 aircraft ~ 0 to many maintenance records

AIRCRAFT - MAINTENANCE RECORDS

For as long as an aircraft is owned by CAB, all maintenance records for the aircraft are kept. A maintenance record applies to one and only one aircraft. For an aircraft, there may be multiple maintenance records.

When the aircraft is removed from the list of aircraft, its maintenance records are deleted as well.

Cascading relationship type




Uempty
The remaining business relationship types for our sample application domain can be found in Appendix A - Sample Problem Statement.


Student Notebook


Notes:

The fourth section of the problem statement should list all business constraints for the business object types and business relationship types of the application domain.

Business constraints represent restrictions that exist for the objects of business object types or the relationships of business relationship types or a mixture thereof. For example, such a restriction could require that for each (existing) business object of business object type O1 a corresponding business object of business object type O2 must exist. We will see further, more intuitive, examples for our sample airline application domain on the subsequent visuals.

For each business constraint, the problem statement should contain the following information:

• A textual description of the business constraint, i.e., of the restriction the business objects or business relationships involved must adhere to.

• The description should identify the business object types and/or business relationship types to whose objects or relationships the restriction applies, i.e., whose insert, update,


A listing of all constraints for the business object types and/or business relationship types including:

A textual description of the constraint, i.e., the restriction that must be adhered to

The business object types and business relationship types to which the constraint applies

Constraints may apply to a mixture of business object types and business relationship types

Constraints may apply to a single or to multiple business object types or business relationship type

When the constraint is to be applied

When object or relationship is added, changed, or removed

Under which circumstances/conditions

Action to be performed when constraint is violated




Uempty
or delete operations are limited by the business constraint. As mentioned before, a single business constraint can restrict the objects or relationships of a single or multiple business object types or business relationship types, or of a mixture thereof.
• The description should specify when the appropriate restriction is to be applied, i.e., what triggers the application of the restriction. This has two facets:

1. There may be circumstances or conditions attached to a business constraint specifying that the restriction is to be exercised only if these conditions are met. For example, the condition could specify that the restriction only concerns aircraft manufactured by Boeing or that the restriction only applies to aircraft put in service before January 1, 1985.

2. For the affected business object types or business relationship types, the description should specify the type of operations (insert, update, or delete) for which the constraint must be enforced provided that the before-mentioned conditions are met.

• The description of the business constraint should specify the action to be performed when the constraint is violated. The simplest form of action is to reject the operation. However, there are more complex actions possible. For example, the violation of the constraint could trigger the creation of a business object for another business object type.

As you can see from the description, a business constraint may not only involve the business object types or business relationship types to which its restriction applies, but also other business object types or business relationship types for evaluating the triggering condition or for the action to be performed if the constraint is violated.



Student Notebook


Notes:

This visual illustrates a business constraint for our sample airline called Come Aboard. The business constraint limits the number of engines for an aircraft to the number of engines for the corresponding aircraft model.

The business object type to which the constraint applies is Aircraft. The restriction controls how many engines an aircraft can have.

There is not a particular condition under which the constraint is to be applied. The constraint must be verified (enforced) whenever an engine is added to (mounted on) an aircraft.

The request to add an engine to an aircraft should be rejected if the limit for the corresponding aircraft model were exceeded, i.e., the constraint were violated.

As mentioned before, the business constraint applies to business object type Aircraft. However, in order to verify it, business relationship type Aircraft Belongs to Aircraft Model and business object type Aircraft Models are needed.


Business Constraints

The following constraints exist for the business object and relationship types that CAB wants to maintain in its database:

To what the constraint applies

Business object type

An aircraft cannot have more engines mounted than the aircraft model allows.

NUMBER OF ENGINES ON AIRCRAFT

The request to add an engine to an aircraft must be rejected if it violates the constraint.

Action if constraint is violated

To be enforced when an engine is added to an aircraft.

When constraint is to be applied




Uempty


Notes:

The above business constraint requires that the captain and the copilot for a flight must be different.

This business constraint applies to the relationships of a business relationship type rather than to the objects of a business object type. It applies to business relationship type Pilot for Flight which interconnects business object types Pilots and Flights.

The corresponding restriction must be applied in two cases:

• When a pilot is assigned to a flight, i.e., when a new relationship for the business relationship type is added.

• When a pilot assignment is changed, i.e., an existing relationship for the business relationship type is changed. (You could also view this as the deletion of the old business relationship followed by the addition of a new business relationship.)

The pilot assignment (new or changed) should be rejected if pilot and copilot were the same.


Business relationship type

A pilot cannot be captain and copilot for the same flight.

CAPTAIN AND COPILOT MUST BE DIFFERENT

The pilot assignment must be rejected if the pilot does not qualify for the flight.


To be enforced when a pilot is assigned to a flight or the pilot assignment is changed.




Student Notebook

Note that the business constraint applies to Pilot for Flight and also needs Pilot for Flight to check if the constraint has been violated.




Uempty


Notes:

The business constraint on this visual requires that the pilots for a flight must be licensed to fly the aircraft model used for the leg for the flight. This means that the above business constraint applies to the same business relationship type as the previous example: Pilot for Flight.

However, there is a peculiarity for this business constraint. The associated restriction is to be checked:

1. When a pilot is assigned to a flight or a pilot assignment is changed.

2. When the aircraft model for the leg for the flight is changed. As a consequence, pilots previously assigned to flights for the leg might no longer be licensed to fly the new aircraft model.

The point illustrated here is that a constraint may also have to be enforced when business objects or business relationships of another business object type or business relationship type are inserted, updated, or changed. In case of the above example, the constraint has to



To be checked when a pilot is assigned to a flight or when a previous pilot assignment is changed.

Also to be verified if the aircraft model for a leg of an itinerary is changed.


The pilot assignment is to be rejected if the pilot does not qualify for the flight.

In this case, previous pilot assignments for flights for the leg must be canceled and appropriate notifications must be given.

Business relationship type

A pilot for a flight must have the license to fly the aircraft model for the leg for the flight.

PILOTS FOR FLIGHT MUST HAVE LICENSE FOR AIRCRAFT MODEL FOR LEG



Student Notebook

be verified when a relationship of business relationship type Aircraft Model for Leg is changed.

As shown on the visual, the action to be performed when the constraint is violated depends on what was causing the violation, the assignment of a pilot or the reassignment of an aircraft model.




Uempty
Checkpoint


a. The problem statement should give the database designer a basic understanding of the application domain.

b. The problem statement should allow the database designer to create an entity-relationship model for the application domain.

c. The problem statement is input for the database designer.

d. The problem statement is created by the database designer.

e. The database designer should assist the application domain expert in creating the problem statement.

f. The problem statement should describe the current implementation of the application domain.

g. The problem statement should be a global, functional, description of the application domain.

2. What are the main sections of a problem statement.

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

3. What is the purpose of the overview section?

_____________________________________________________

_____________________________________________________


a. Business object types list the possible business objects for the application domain.

b. Business object types represent the categories of business objects that are important to the application domain.

c. All business objects belonging to the same business object type have the same meaning and characteristics.



Student Notebook

5. For each business object type, the problem statement should describe how its objects can be identified. (T/F)

6. What is a business relationship type?

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

7. List at least three things that a problem statement should describe for each business relationship type.

_____________________________________________________

_____________________________________________________

_____________________________________________________

8. For a mandatory business relationship type, there exists at least one business object type whose objects must always have (at least) one business relationship of that type. (T/F)

9. How do you call a business relationship type that is based on a business object type the deletion of whose objects causes the deletion of all objects having a relationship with the deleted objects?

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

10.What is a business constraint?

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________




Uempty
11. List the items that the problem statement should contain for each business constraint.
_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

12.The checking of a business constraint can be triggered by insert, delete, or update operations for business object types or business relationship types other than the one to which the restriction applies. (T/F)



Student Notebook


Notes:

Unit Summary

Describes relationship types between business objects

Who with whomIf alwaysWith how manyWhat if partner deleted

Describes business object types

Textual descriptionHow to identify objects

Describes business constraints

For whomUnder which circumstances, when, which reaction

Input for database design

Global description of application domain

Established by application domain expert

With assistance of database designerProblem

Statement




Uempty
Unit 4. Entity-Relationship Model

This unit describes how to establish an entity-relationship model visualizing and structuring the business object types of the application domain as entity types; illustrating the relationships between the business object types as relationship types; and modeling the constraints for the entity types and relationship types imposed by the business constraints. It describes the constructs needed to establish an entity-relationship model based on a sample application domain and discusses alternate solutions.



• Define the entity types for an application domain based on a problem statement.

• Define the relationship types for an application domain based on a problem statement.

• Define the supertypes and subtypes for an application domain.

• Identify the constraints for the entity types and relationship types of an application domain.

• Establish an entity-relationship model for an application domain.


Accountability:



© Copyright IBM Corp. 2000, 2002 Unit 4. Entity-Relat ionship Model 4-1

Student Notebook


Notes:

During this unit, it will be described how to develop an entity-relationship model for an application domain. You will learn to analyze a given problem statement and to define the entity types for the application domain represented by the problem statement. There are basic entity types corresponding to the truly independent business object types and dependent entity types that are based on other entity types. Their instances require the existence of corresponding instances of the entity type they are based upon.

Furthermore, you will learn to determine the relationship types for an application domain and to represent them in an entity-relationship model. Most of the relationship types of an entity-relationship model correspond to the business relationship types described in the problem statement for the application domain, but there will also be others. Most relationship types do not have attributes further describing them, but you will experience relationship types with attributes as well and learn how to represent their attributes in an entity-relationship model.

A special category of entity types are supertypes and subtypes which are interconnected by so-called is-bundles (relationship types). They allow you to form categories and classify

Unit Objectives

Define the supertypes and subtypes for an application domain


Define the entity types for an application domain based on a problem statement:

Basic and dependent entity types

With and without attributes

Define the relationship types for an application domain based on a problem statement

Establish an Entity-Relationship Model for an application domain

Identify the constraints for the entity types and relationship types of an application domain




Uempty
the represented entity instances, i.e., the objects represented by the entity types. Supertypes and subtypes are advanced modeling constructs.
A further topic of this unit are constraints for the entity types and relationship types of an application domain. Most of the constraints are derived from the business constraints for the application domain, but there are also others.

All elements discussed in this unit are ingredients of entity-relationship models. Thus, you will learn in this unit how to establish an entity-relationship model for an application domain.

During the unit, we will use the sample application domain for the airline company called Come Aboard (CAB) we have used in the previous unit. Based on the problem statement described in Appendix A - Sample Problem Statement, we will establish an entity-relationship model for our sample airline company. We will pick specific items of the problem statement and illustrate how they are modeled.



Student Notebook




Uempty
4.1 Entity Types


Student Notebook

Figure 4-2. ER Model in Design Methodology CF182.0

Notes:

The development of an entity-relationship model for an application domain is the first step of the conceptual view during which the process- and implementation-independent architecture of the data of the application domain is established.

The application domain, described by the problem statement, is scrutinized for its business object types, the relationships between them, and for business constraints. As a result of the scrutiny, an entity-relationship model is established visualizing and structuring the business object types of the application domain as entity types; illustrating the relationship types between the entity types resulting from the business relationship types of the application domain; and modeling the constraints for the entity types and relationship types imposed by the business constraints.

As a general rule, the better the problem statement describing the application domain, the easier it will be to establish the corresponding entity-relationship model. Therefore, you should insist on a good problem statement as outlined in Unit 3 - Problem Statement and assist the domain expert in producing it.

ER Model in Design Methodology


Data Inventory

Tuple Types

Tables

Indexes

Process Inventory


Integrity Rules

Problem Statement

ConceptualView





Uempty
Even if you have a good problem statement, you will most likely encounter items that are obscure. Consult the domain expert or the appropriate department of expertise to clarify the open issues. Do not make assumptions on your own that are not based upon knowledge of the application domain, but are your own speculations. They might be wrong!
The entity-relationship model is the basis for all subsequent design steps. A wrong assumption for the entity-relationship model will produce incorrect results for the subsequent steps. If the problem is detected during a later step, you must reiterate all preceding steps, starting with the entity-relationship model or even the problem statement, and correct the erroneous results. Therefore, it is advisable to solve open questions concerning the application domain with the competent people right away and not to make assumptions not based on knowledge of the application domain.

The development of the entity-relationship model is not a one-time affair. The entity-relationship model is maintained constantly and changed as the subsequent steps reveal errors or discover undocumented business object types, business relationship types, or business constraints. If the problems found concern undocumented items of the application domain or items not properly described in the problem statement, the problem statement must be corrected as well. It should be corrected by the domain expert.



Student Notebook

Figure 4-3. Entity Types, Entity Instances, Attributes CF182.0

Notes:

As the name entity-relationship model suggests, entity types are one of the building blocks of an entity-relationship model. They constitute independent conceptual units representing classes of objects with the sam e meaning and characteristics about which information is to be stored and maintained.

Many of these classes derive themselves from the business object types for the application domain, but not necessarily all of them. Some of the entity types, especially those added later in the design process, are caused by design rules, e.g., the rules for avoiding redundancies in the information stored.

You should realize that an entity type represents a class of objects rather than a specific object. It is a conceptual category of items. The items may physically exist, such as an aircraft, or they may not physically exist and only be imaginary, such as an aircraft model. (An aircraft model does not physically exist, it only exists on paper, i.e., it is imaginary.)

An entity type must be an independent conceptual unit. This means that the objects of the entity type must have a conceptual meaning by themselves; that the information represented by the objects is understandable by itself; and that, from the application

Entity Types, Entity Instances, Attributes

An independent conceptual unit representing a class of objects with the same meaning and characteristics

about which information is to be stored and maintained

Entity Type

An actual object belonging to an entity type

Entity (Instance)

A conceptual piece of information with a distinct meaning stored for the instances of an entity type,

not actual values

Attribute




Uempty
domain's point of view, it makes sense to process the information represented by the objects independently. For our sample airline, it makes sense to have an entity type PILOT representing the pilots of the company and providing information such as the name, age, and even shoe size of the pilots (if CAB provides the shoes for the pilots as part of their uniforms). However, it would not make sense to have an entity type combining the shoe size of pilots with the wing span of aircraft models because the individual objects of the entity type would not have a reasonable conceptual meaning.
The term independent in this context does not mean that the objects of an entity type are completely unrelated to objects of other entity types. In contrast, in a real-life entity-relationship model, there are many interconnections between the objects of the various entity types. It is fairly suspicious if there are entity types having no associations with other entity types. For our sample airline company, PILOT and AIRCRAFT MODEL are two apparent entity types making sense on their own, but, nevertheless, being interrelated with each other: Pilots have licenses to fly aircraft models.

Up to now, we talked about the objects belonging to an entity type. In modeling terminology, the actual objects belonging to an entity type are referred to as instances of the entity type, entity instances, or simply entities.

The term attribute is used to denote a conceptual piece of information with a distinct meaning stored for the instances of an entity type. Attributes represent the conceptual type of information stored, such as last name, and not actual values (such as MILLER for last name). Therefore, it would be better to talk about attribute types, but this is not the terminology generally used.

Attributes represent partial information for an entity type. Whether several pieces of information together form a distinct entity type or just a set of attributes of a larger entity type depends on their importance for the application domain and their independence. For example, addresses consisting of country, state, city, and street only represent a set of attributes for the pilots of our sample airline company. They would form a separate entity type, identifying buildings, for a shipping company.



Student Notebook

Figure 4-4. Properties of Entity Types and Attributes CF182.0

Notes:

Entity types and attributes have the following properties:

• Each entity type receives a unique name. This should be the generic class name expressing the function of the instances of the entity type. By convention, the name for the class name is used in the singular form as this is done for biological genders where you talk about the class Human Being and not about the class Human Beings. All capital letters will be used for the name. Since the name for an entity type is used for reference purposes, it must be unique.

For our sample airline application domain, entity types are for example: AIRCRAFT MODEL, AIRCRAFT, PILOT, and AIRPORT.

• For all instances of an entity type, the same common characteristics are stored. Primarily, this means that the same attributes are stored. However, as we will see later on, it also means the same types of relationships and/or constraints are recorded.

• The attributes stored for the instances of an entity type have a direct bearing on the meaning of the entity type. In other words, attributes are not stored for the instances of

Properties of Entity Types and Attributes

For all entity instances, the same common characteristics (attributes) are stored

The attributes stored for the instances of an entity type have a direct bearing on the meaning of the entity type

Each entity type receives a unique name

Should be generic class name in singular form Examples: AIRCRAFT MODEL, PILOT, AIRPORT

The attributes for an entity type receive a unique name

Should be the descriptive term for the characteristics Examples: Manufacturer, Last Name, Airport Code

Every entity type must have a set of one or more attributes whose values together uniquely and permanently identify its (possible) entity instances

Entity key

Attributes can be elementary (indivisible) or composite (have components)

For an entity instance, an attribute may assume no, one, or multiple values

All having the same meaning Minimum and maximum number of values depends on entity type




Uempty
an entity type if they have nothing to do with the semantics of the entity type. For example, Wing Span should not be an attribute of entity type PILOT. It has nothing to do with pilots. It is an attribute of aircraft models and, thus, of entity type AIRCRAFT MODEL.
As obvious this statement seems to be, again and again attributes are assigned to the wrong entity type.

• The attributes for an entity type receive a unique name. The name should clearly express the meaning of the attribute, i.e., the characteristic it represents. The name of an attribute may consist of multiple words. We will start each word with a capital letter except for connecting words such as of, for, or and.

Examples of attributes are:

For entity type AIRCRAFT MODEL: Number of Engines, i.e., the number of engines for an aircraft model.

Manufacturer, i.e., the company manufacturing an aircraft model.

For entity type AIRCRAFT: Aircraft Number, i.e., the unique serial number identifying an aircraft.

Seat, i.e., the seats on an aircraft.

For entity type PILOT: Last Name, i.e., the last name of a pilot.

For entity type AIRPORT: Airport Code, i.e., the three-letter designator used in aviation for the various airports.

• For an entity instance, an attribute may assume no, one, or multiple values (an array of values). However, all values assumed have the same meaning , namely, the meaning imposed by the attribute. For example, attribute Seat for entity type AIRCRAFT may assume multiple values: one for each seat on the particular aircraft. How many values an attribute must assume at least or at most depends on the entity type.

• Attributes can be elementary or composite. From the perspective of the application domain, the values of an elementary attribute are (logically) indivisible. This means they cannot be subdivided into smaller units that, by themselves, have a meaning for the application domain. Thus, they are not structured. For our sample airline company, examples of elementary attributes are: the serial number for an aircraft and the number of engines for an aircraft model.

In contrast, composite attributes consist of components. This means their values can be decomposed into smaller units having an own meaning for the application domain. All values have the same structure imposed by the components. The components of a composite attribute are logically related. They can be elementary attributes or again composite attributes. Each value of a composite attribute for an entity instance is composed of the appropriate values of its components for the entity instance.



Student Notebook

For our sample airline company, attribute Manufacturer mentioned above is a composite attribute for entity type AIRCRAFT MODEL. Some of its components are: Manufacturer Code, Name of Manufacturer, Address, and Phone Number. Address is again a composite attribute.

It depends on the application domain whether an attribute is elementary or composite. For Come Aboard, Name of Person would be a composite attribute for entity type PILOT consisting of elementary attributes Last Name, First Name, and Middle Initial. For a different application domain, the entire name may be considered indivisible since last name, first name, and middle initial are not important as separate pieces of information and are not identifiable.

The elementary attributes for an entity type are the actual carrier of information. Composite attributes just group attributes. The information they represent is the information of their components. If an entity type contains a composite attribute, it means that it contains all components of the composite attribute.

Composite attributes are not absolutely necessary. However, they are very helpful in identifying information of an entity type that logically belongs together and referring to it.

• It is an absolute requirement that the instances of an entity type be uniquely and reliably identifiable. Therefore, every entity type must have a set of one or more attributes uniquely and permanently identifying all possible entity instances for the entity type. Such a set of attributes is referred to as entity key.

For the entity key, it is not sufficient that its attributes uniquely identify the possible entity instances. It is also required that all attributes of the set are necessary for the unique identification, i.e., that none of the attributes can be omitted without losing the unique identifiability. This is referred to as the minimum principle for entity keys.

It is conceivable that an entity type has multiple potential entity keys. If so, you should choose the one that most naturally, with regard to the application domain, represents the entity instances. If there are multiple of that type, choose the one with the fewest attributes since this eases referencing the entity instances. (Note that different candidate entity keys may have different numbers of attributes.)




Uempty

Figure 4-5. Representation of Entity Types CF182.0

Notes:

In an entity-relationship model, entity types are represented as rectangles. Most of the time, the rectangle for an entity type just contains the name of the entity type because of the limited size of the drawing area available for the entity-relationship model. This representation is referred to as standard representation of the entity types.

A more detailed representation of an entity type includes attributes for the entity type. In this case, the rectangle contains a header separated from the rest of the information by a horizontal line. The header contains the name of the entity type. Below the header, the attributes for the entity type are listed. For a composite attribute, its components may be shown as well and are indented to identify them as components.

The attributes belonging to the entity key are preceded by the letter K, the nonkey attributes are not. If a composite attribute (i.e., all its components) belongs to the entity key, it is marked appropriately and not its components. This representation of entity types is referred to as attribute representation of the entity types.

To make the key attributes better visible, their names will be italicized throughout this document especially when representing entity instances.

Representation of Entity Types

Entity TypeAIRCRAFT MODEL

Standard Representation

Attribute Representation

AIRCRAFT MODEL

DimensionsLength

Wing Span

Type CodeKModel NumberK

Category

Height

Entity Instances

AIRCRAFT MODEL

DimensionsLength:

Wing Span:

Type Code:Model Number:

Category:

Height:

Jet

70.67 m

64.31 m

B747400

19.33 m

AIRCRAFT MODEL

Jet

46.67 m

43.90 m

A310300

15.81 m

DimensionsLength:

Wing Span:

Type Code:Model Number:

Category:

Height:



Student Notebook

Frequently, only a few sample attributes are shown to restrict the size of the rectangle. In general, the illustrated attributes include the attributes of the entity key. Often, only the attributes of the entity key are shown.

Because it tends to reduce the clarity of the entity-relationship model and because of the limited size of the drawing area, generally, the attribute representation is only used if:

• the attributes are important for the understanding • a small portion of the entity-relationship model is illustrated.

In general, tools only show the standard representation, i.e., the rectangles with the names. If you click with the mouse on the rectangle for an entity type, a separate window is opened providing a textual description of the entity type and listing the attributes as far as they have been entered. A similar approach can be applied when using paper for the entity-relationship model: The entity-relationship model is drawn in standard representation and, for each entity type, a page is added providing details about the entity type including the name, a textual description, and the attributes known.

Sometimes, it is desirable to illustrate a few sample instances for an entity type. An entity instance is represented as a rectangular box with a header containing the name of the entity type. The header is followed by a line for each desired attribute. For an elementary attribute, the name of the attribute is followed by a colon (:) which, in turn, is followed by the values of the attribute for the represented entity instance. If the attribute assumes multiple values for the entity instance, the values are separated by commas.

If the components for a composite attribute are shown, the line for a composite attribute only contains the name of the composite attribute. The components of composite attributes are indented as for the attribute representation. If the components for a composite attribute are not shown, the line for a composite attribute has the same format as a line for an elementary attribute. The components of a value are enclosed in parentheses and separated by commas.

As described for the attribute representation, we will italicize the lines (name and values) for the key attributes throughout this course. If a composite attribute belongs to the entity key and its components are shown, only the line for the composite attribute is italicized.

The examples on the visual list both key and nonkey attributes. Only a subset of the attributes for entity type AIRCRAFT MODEL is shown. Generally, when developing the initial entity-relationship model, you do not know all attributes for the entity types yet. However, you should know the entity keys! If the key for an entity type cannot be derived from the description of the related business object type in the problem statement, contact the domain expert to identify the entity key.

The visual illustrates both the standard representation and the attribute representation for entity type AIRCRAFT MODEL. It also shows two entity instances, a Boeing 747, Model 400, and an Airbus 310, Model 300.




Uempty

Figure 4-6. Determining the Entity Types (1 of 2) CF182.0

Notes:

Since the entity-relationship model must reflect the application domain and visualize its business object types, the problem statement for the application domain constitutes the primary source for determining the entity types. For each business object type of the application domain, there is normally an entity type in the entity-relationship model. The entity types derived this way are usually referred to as basic entity types since they are inherent (basic) to the application domain.

This illustrates how important it is that a good problem statement is available when the modeling begins. Therefore, the database designer should insist on a good problem statement being established by the domain expert (with the help of the database designer to ensure that it contains the proper information). The better the problem statement, the easier it is to develop the corresponding entity-relationship model.

You should realize that the final entity-relationship model will contain additional entity types that were not apparent from the problem statement. For some of these entity types, the corresponding business object types were simply forgotten in the problem statement, and the problem statement should be corrected accordingly by the domain expert. Other entity

Determining the Entity Types (1 of 2)

BUT . . .

Primary source is problem statement for application domain

There will be others as the design progresses!!!

Business Object Type

(Basic) Entity Type



Student Notebook

types were part of more complex business object types and must be separated out. We will see such cases later in this unit. For them, you should also request an update of the problem statement by the domain expert.

Furthermore, structuring requirements of the later steps of the design process may introduce additional entity types. The entity-relationship model is updated as these entity types are found. You may rightfully ask if these entity types do not have corresponding business object types? In many cases (if not all), they indeed should have corresponding business object types. However, these business object types are frequently not immediately obvious to the domain expert because they play a secondary role from the perspective of the application domain. It is highly advisable that the database designer discusses these entity types with the domain expert and convinces him/her to update the problem statement accordingly.




Uempty

Figure 4-7. Determining the Entity Types (2 of 2) CF182.0

Notes:

This visual lists a set of questions you might want to ask yourself before accepting something (a business object type) as entity type:

• Considered by themselves, have the instances of the candidate entity type a reasonable meaning for the application domain?

The term by themselves emphasizes the independence of the instances. Further subquestions leading to the answer are:

- What would be the generic class name and does it make sense in the context of the application domain? Does it indeed represent a conceptual entity compatible with the application domain?

- Would the application domain conceivably process the instances on their own, i.e., do the instances have a meaning by themselves, or are the instances only meaningful when processed together with the instances of another entity type? In the latter case, you may rather be dealing with a subset of attributes for the other entity type and not with a separate entity type.

Determining the Entity Types (2 of 2)

If you can answer all questions affirmatively, it is most likely an entity type

Ask yourself the following questions:

Considered by themselves, have the entity instances a meaning for the application domain?

What is the generic class name?Would the application domain conceivably process the instances on their own?

Do the entity instances also have nonkey attributes?

Will there be eventually multiple instances of that type for the application domain?

How can the entity instances uniquely be identified?

What is the entity key?

If necessary, go back and ask the domain expert!



Student Notebook

• How would the instances of the candidate entity type be uniquely identified, i.e., what would be the entity key?

You should be able to find a set of attributes that identifies the entity instances in a manner natural to the application domain.

• Do the instances of the candidate entity type also have nonkey attributes?

It is possible, but very seldom, that all attributes of an entity type belong to the entity key. Therefore, you should be suspicious if there are not any nonkey attributes.

• Will the candidate entity type eventually contain multiple instances or will there always be only a single instance for the entity type?

Again, it is possible that an entity type will always contain just a single entity instance (something like a control record), but it is very unusual and should make you suspicious.

If the problem statement does not contain the answers to the above questions, go back to the domain expert or the appropriate department of expertise. Do not make unfounded assumptions!

If you can answer all the above questions satisfactorily and affirmatively, the candidate entity type is most likely a real entity type. However, you should be aware that the affirmative answers only provide clues and not proofs that something is an entity type. That is because the entity types depend on the application domain.




Uempty

Figure 4-8. Entity Types - A Piece of Advice CF182.0

Notes:

Since the entity-relationship model is the basis for all further steps of the design process, you should establish the entity types very carefully. However, you should not linger on endlessly. Projects have failed in the past because the participants fought endlessly over what the entity types for the application domain were.

At such an early stage of the design process, you may not have all information to be a hundred percent certain of the entity types, especially since the problem statement does not list all data elements yet that play a role for the application domain. It only lists a few sample data elements for each business object type.

Despite of all good intentions when writing the problem statement, some entity types may be hidden and only emerge in the subsequent steps or when all data elements are compiled.

Conversely, some of the entity types may have been overrated and prove not to be entity types after all as more information becomes available during the subsequent steps and you become more familiar with the application domain.

Entity Types - A Piece of Advice

You should establish the entity types carefully

However:

Do not linger on endlessly!!!

Remember: It is an iterative

process!!!

You may not have all information yet

Some entity types may be hidden and reveal themselves in the subsequent steps

Some entity types may turn out not to be entity types as you get more information and become wiser



Student Notebook

Thus, establish the entity types carefully, but continue on to the next steps of the design after you feel confident with what you have done. Remember that the design methodology used in this course represents an iterative approach allowing you to continuously improve the entity-relationship model and the dependent results.




Uempty

Figure 4-9. Entity Types for CAB CF182.0

Notes:

This visual illustrates the entity types for our sample airline company called Come Aboard. The entity types were derived from the problem statement in Appendix A - Sample Problem Statement. Since this is a fairly good problem statement, there is an entity type for each business object type. However, later in this unit, we will see that some additional entity types will have to be added.

The entity types have the following entity keys:

Entity Type Entity Key

AIRCRAFT MODEL Type Code, Model Number AIRCRAFT Aircraft Number PILOT Employee Number MECHANIC Employee Number AIRPORT Airport Code

ITINERARY Flight Number

Entity Types for CAB

PILOTMECHANIC

ITINERARY FLIGHT

AIRPORT

AIRCRAFT

AIRCRAFT MODEL

MAINTENANCE RECORD



Student Notebook

FLIGHT Flight Number, From (airport of departure), To (airport of arrival), Flight Locator

MAINTENANCE RECORD Maintenance Number

Entity Type Entity Key




Uempty
4.2 Relationship Types


Student Notebook

Figure 4-10. Relationship Types Between Entity Types CF182.0

Notes:

As the name already suggests, relationship types form the second component of entity-relationship models. Initially, we will concentrate on relationship types between entity types. Later, we will expand, i.e., generalize, the relationship type definition given here.

A relationship type (between entity types) is a conceptual association between the entity instances, one each, of two not necessarily different entity types. Thus, it describes a class of interrelationships, having the same characteristics, connecting the entity instances of two entity types.

The terms relationship instance and relationship are used to denote a specific interrelationship of a given relationship type between specific instances of the entity types for the relationship type.

Relationship instances of the same relationship type always interconnect instances of the same respective entity types. If r1 and r2 are relationship instances of the same relationship type and r1 associates an instance of entity type E1 with an instance of entity type E2, then r2 must also interconnect an instance of E1 to an instance of E2. Furthermore, r1 and r2 must have the same meaning and characteristics.

Relationship Types Between Entity Types

A conceptual association between the entity instances, one each, of two not necessarily different entity types

Relationship Type

A specific interrelation of a given relationship type between specific entity instances of the entity types for

the relationship type

Relationship (Instance)

Relationship type = Entirety of all relationships with the same meaning




Uempty
Taking this into account, you can conceive a relationship type as the entirety of all (potential) relationships, with the same meaning, between entity instances of two (not necessarily different) entity types.
By definition, relationship instances exist only as long as the instances exist they interconnect. If one of the instances is deleted, the relationship instance no longer exists.

By definition, relationship types are binary in the sense that their instances always interconnect two entity instances. At the first glance, this seems to be restrictive, but it will prove not to be the case when the relationship type definition is extended later in this unit.

Please note the similarity of the relationship type definition to the definition of business relationship types given in Unit 3 - Problem Statement. Therefore, you may already suspect that the business relationship types will be the primary source for the relationship types of the entity-relationship model. This is indeed the case, but there will be additional relationship types as we will see later in this unit.



Student Notebook

Figure 4-11. Relationship Types in ER Model CF182.0

Notes:

In the entity-relationship model, the entity types for a relationship type are interconnected.

Since relationship types are binary by definition as explained before, each relationship type can be viewed from two directions. One of the direction is referred to as primary direction, the other as inverse direction. The term primary seems to indicate that one of the directions is more important than the other. From a data modeling perspective, this is not the case and it is irrelevant which direction is chosen as primary direction. From an application point of view, you may want to choose the direction as primary direction which, application-wise, is more important.

In the above example, the relationship type interconnecting the entity types PILOT and AIRCRAFT MODEL can be looked at from PILOT's point of view meaning that a pilot can fly an aircraft model. The relationship type can also be looked at from AIRCRAFT MODEL's point of view. Then, the meaning is that an aircraft model can be flown by a pilot. As expected, the meanings are complementary. Let us choose the direction from PILOT to AIRCRAFT MODEL as the primary direction.

Relationship Types in ER Model

PILOT

AIRCRAFT MODEL

_can_fly_Name for

primary direction

Arrow for primary direction

Source for primary direction

Target for primary direction

(_can_be_flown_by_)

Target for inverse direction

Source for inverse direction

Name for inverse direction




Uempty
In the entity-relationship model, the primary direction of a relationship type is indicated by an arrow specifying the direction of the view.
To allow referencing them, all relationship types are uniquely named. More precisely, each direction receives a unique name. In the entity-relationship model, the names for the directions are placed next to the connecting arrow and the name for the inverse direction is enclosed in parentheses. This convention together with the arrow for the primary direction allows you to understand and interpret the relationship type correctly from the entity-relationship model.

When talking about a direction of a relationship type, it makes sense to talk about the source and the target of the direction. The source of the direction is the entity type from which you look at the relationship type. The target is the opposite entity type.

In the example on the visual, _can_fly_ (more precisely, PILOT_can_fly_AIRCRAFT MODEL as we will see in a minute) is the name of the primary direction. PILOT is the source for the primary direction and AIRCRAFT MODEL its target. For the inverse direction, the name is _can_be_flown_by_ (more precisely, AIRCRAFT MODEL_can_be_flown_by_PILOT); AIRCRAFT MODEL is the source; and PILOT is the target.

From a data modeling point of view, it is only important to be able to identify the relationship type as such and not the various directions. For this, it is sufficient to list a single name for the relationship type in the entity-relationship model. For simplicity, the name of the primary direction is used since it does not require the enclosing parentheses.

People often talk about the source and target of a relationship type without mentioning a specific direction. In this case, the source and the target of the primary direction are meant. We will follow this convention as well throughout this course.

As mentioned before, the directions of a relationship type receive unique names. To avoid overly lengthy names in the entity-relationship model, we are using the following naming convention throughout the course:

• The full name for a direction always starts with the name of the source followed by an underscore and always ends with the name of the target preceded by an underscore. All words in between the source and target names are separated by underscores rather than blanks.

• In the entity-relationship model, only the part of the name is shown that follows the name of the source and precedes the name of the target. The names of the source and the target are not shown. Thus, the illustrated name portion (abbreviated name) always starts with an underscore and always ends with an underscore signaling the absence of the source and target names.

This convention allow us to use the same abbreviated name in the entity-relationship model for the directions of different relationship types or for both directions of a relationship type and still to be able to determine the full unique names for them.



Student Notebook

In addition to illustrating the relationship types in the entity-relationship model, you should provide a detailed description for them on a separate piece of paper including the names for both directions, the names of their sources and targets, and a textual description of the meaning of the relationship type. When following the above naming convention, the names of the source and target for a direction are implicitly identified and need not be specified explicitly.




Uempty

Figure 4-12. Relationship Instance Diagram CF182.0

Notes:

Relationship instance diagrams are a useful means to illustrate a relationship type by example. They cannot replace entity-relationship models. They can only help to better visualize small parts of an entity-relationship model by means of examples.

In a relationship instance diagram, sample entity instances are interconnected by named arrows in the manner intended by the subject relationship type.

The topmost part of the above visual shows how the relationship type is represented in an entity-relationship model. The representation is followed by a relationship instance diagram for the relationship type. The relationship instance diagram shows that pilot Miller, Jack (employee number 0491337) can fly Boeing 747, Model 400 (type code B747, model number 400), and Airbus 340, Model 100 (type code A340, model number 100). Pilot Smith, Joe can fly Airbus 340, Model 100, and 310, Model 300.

Relationship Instance Diagram

PILOT

Employee Number:Last Name:First Name:. . .

0491337MillerJack. . .

PILOT


1662951SmithJoe. . .

AIRCRAFT MODEL

Type Code:Model Number:Cruising Speed:. . .

B747400930 km/h. . .

AIRCRAFT MODEL


A340100890 km/h. . .

_can_fly_

_can_fly_

_can_fly_

AIRCRAFT MODEL


A310300860 km/h. . .

_can_fly_

_can_fly_

(_can_be_flown_by_)PILOT AIRCRAFT

MODEL



Student Notebook

Figure 4-13. Multiple Relationship Types for Entity Types CF182.0

Notes:

The above visual demonstrates that there may exist multiple different relationship types between two entity types underlining why it is important to name the relationship types (more precisely, their directions). The names allow you to differentiate the various relationship types.

As described by the problem statement for our sample airline company called Come Aboard, to each flight, one pilot is assigned as (flight) captain and another as copilot. This gives rise to two relationship types between entity types PILOT and FLIGHT:

PILOT_captain_for_FLIGHT PILOT_copilot_for_FLIGHT

The upper part of the picture illustrates their representation in an entity-relationship model. Only the primary names are shown for the relationship types. For better distinguishability, the connecting arrow for PILOT_copilot_for_FLIGHT has been dotted. This does not imply a special meaning.

Multiple Relationship Types for Entity Types

_captain_for_

_copilot_for_PILOT FLIGHT

_copilot_for_

PILOT


0491337MillerJack. . .

PILOT


1662951SmithJoe. . .

PILOT


0844092FergusonJane. . .

_captain_for_

_captain_for_

_copilot_for_

FLIGHT

YY1842FRAJFK453

1999-07-2110:30. . .

Departure Time:Departure Date:

To:

Flight Number:From:

Flight Locator:Planned Departure

. . .

FLIGHT

YY2843ATLSJC210

1999-08-0116:35. . .

Departure Time:Departure Date:

To:

Flight Number:From:

Flight Locator:Planned Departure

. . .




Uempty
The lower part of the picture illustrates a relationship instance diagram comprising the two relationship types. Pilot Miller, Jack with employee number 0491337 is captain for flight YY1842, flight locator 453, from Frankfurt (FRA) to New York Kennedy airport (JFK). Pilot Smith, Joe (employee number 1662951) is pilot for flight YY2843, flight locator 210, from Atlanta (ATL) to San Jose, California (SJC). Smith, Joe is also copilot for the flight Miller, Jack is captain for.
Pilot Ferguson, Jane is copilot for captain Smith's flight from Atlanta to San Jose.

The requirement that captain and copilot for a flight must be different cannot be modeled by these relationship types. It must be expressed by means of constraints discussed later.



Student Notebook

Figure 4-14. Unary Relationship Types CF182.0

Notes:

The problem statement for our sample airline company states that itineraries consist of ordered collections of nonstop connections between airports. As the term connection implies, nonstop connections are relationships between two airports: an airport has a nonstop connection to another airport. As indicated on the visual, the abbreviated name for the relationship type is _nonstop_to_. Accordingly, the full name is AIRPORT_nonstop_to_AIRPORT. The first airport is the airport of departure, the second airport the airport of arrival.

Even though the individual relationship instances are binary in that they interconnect two entity instances, a relationship type interconnecting instances of the same entity type is referred to as unary relationship type.

The upper part of the visual illustrates the representation of a unary relationship type in an entity-relationship model: the arrow returns to the entity type it starts from.

The lower part of the visual illustrates instances for the represented relationship type. Atlanta (ATL) has a nonstop connection to San Jose, California (SJC). Stuttgart, Germany

Unary Relationship Types

_nonstop_to_AIRPORT

AIRPORT

Airport Code:Country:City:. . .

SJCUSASan Jose. . .

AIRPORT


ATLUSAAtlanta. . .

AIRPORT


STRGermanyStuttgart. . .

_nonstop_to_ _nonstop_to_

_nonstop_to_

_nonstop_to_




Uempty
(STR), has nonstop connections to Atlanta and San Jose. San Jose has a nonstop connection to Stuttgart.


Student Notebook

Figure 4-15. A Special Relationship Type CF182.0

Notes:

Let us look closer at the business relationship type for Come Aboard associating airports with itineraries. It states that itineraries consist of legs. The legs are nonstop connections between two airports, the starting airport (airport of departure) and the ending airport (airport of arrival) for the respective leg.

As explained on the previous visual, the legs (nonstop connections) are relationship instances and not entity instances because they only reflect the fact that two airports are interconnected, i.e., have a relationship. The appropriate relationship type had been called AIRPORT_nonstop_to_AIRPORT.

Since itineraries consist of one or more legs, they have relationships with legs, i.e., with relationships (relationship instances). This means that we need to extend the definition of relationship types to allow relationship types as source or target.


AIRPORTS - ITINERARIES

An itinerary consists of one or more legs. The legs are nonstop connections between two airports, the starting and ending airports for the leg. Airports can be the starting or ending points for legs of multiple itineraries.

A Special Relationship Type

Need to extend the definition of relationship types

Itineraries have relationships with relationships

Legs are relationship instances, not entity instances

Only reflect the fact that two airports are interconnected, i.e., have a relationship




Uempty

Figure 4-16. Relationship Types - Generalized Definition CF182.0

Notes:

This visual contains the general relationship type definition. It extends the previous definition, which only allowed entity types as source or target, by allowing entity types or relationship types as source or target of relationship types. All kinds of combinations are allowed:

• The relationship instances can interconnect the instances of two (not necessarily different) entity types. This was the initial, restricted, definition of relationship types.

• The relationship instances can interconnect the instances of two (not necessarily different) relationship types.

• The relationship instances can interconnect an entity instance and a relationship instance. Either one can be the source or the target. It is not important here which one is the source or the target because the role can be reversed by selecting the other direction of the relationship type as the primary direction.

A relationship instance in this extended sense is nothing else than a specific interrelation of the considered relationship type.

Relationship Types - Generalized Definition

A specific interrelation of a given relationship type

Relationship (Instance)

Relationship Type

A conceptual association between:

The entity instances, one each, of two not necessarily different entity types

The relationship instances, one each, of two not necessarily different relationship types

The entity instances and the relationship instances, one of each, of an entity type and a relationship type



Student Notebook

Figure 4-17. Relationship Type on Relationship Type CF182.0

Notes:

As described in the problem statement for Come Aboard, an itinerary consists of one or more legs. We have already determined that the legs are relationship instances of relationship type AIRPORT_nonstop_to_AIRPORT. Accordingly, we need a relationship type interconnecting this relationship type and entity type ITINERARY. In the entity-relationship model portion above, this relationship type is represented as an arrow from relationship type AIRPORT_nonstop_to_AIRPORT (source) to entity type ITINERARY (target). Its abbreviated name is _in_. According to our naming convention, the full name of the (primary direction of the) relationship type is:

AIRPORT_nonstop_to_AIRPORT_in_ITINERARY

If necessary, parentheses may be used to avoid duplicate names or any misunderstandings.

The lower part of the visual illustrates a relationship instance diagram for the entity-relationship model portion of the upper part:

Relationship Type on Relationship Type

_nonstop_to_

AIRPORT ITINERARY_in_

ITINERARY

Flight Number:. . .

YY0025. . .

ITINERARY

Flight Number:. . .

YY0100. . .

ITINERARY

Flight Number:. . .

YY3367. . .

AIRPORT

Airport Code:. . .

FRA. . .

AIRPORT

Airport Code:. . .

ATL. . .

AIRPORT

Airport Code:. . .

STR. . .

AIRPORT

Airport Code:. . .

SFO. . .

_nonstop_to_

_nonstop_to_

_nonstop_to_

_nonstop_to_

_nonstop_to_

_in_

_in_

_in_

_in_

_in_

_in_

_in_




Uempty
• The itinerary for flight number YY3367 is composed of two nonstop connections: one from Atlanta (ATL) to Stuttgart (STR) and one from Stuttgart to Frankfurt (FRA).
• The itinerary for flight YY0025 consists of three legs (nonstop connections): one from Stuttgart to Frankfurt, one from Frankfurt to Atlanta, and one from Atlanta to San Francisco (SFO).

• The itinerary for flight number YY0100 consists of two legs: one from San Francisco to Atlanta and one from Atlanta to Stuttgart.

If the airline company had round flights (e.g., sightseeing flights), airports could be connected to themselves.

The model does not make a statement about the order of the legs although the problem statement specifies that an itinerary is an ordered collection of nonstop connections. For the sample itineraries above, we have used the implicit rule that the starting airport for the next leg must be the ending airport for the previous leg. This rule may not always hold true or it may not provide the order of the legs if the starting and ending airports for an itinerary are the same (around-the-world trips). It is possible to model the order of the legs. We will do this after we have talked about the necessary modeling constructs later in this unit.

As defined, relationship types are binary in nature in that all relationship instances interconnect two instances. Many modeling methodologies only allow entity types as source or target of relationship types and, in order to compensate for the loss of functionality, introduce n-ary relationship types.

N-ary relationship types interconnect the instances of n entity types. The business relationship type Airports - Itineraries used for this visual would be considered as a ternary (3-ary) relationship type by these methodologies whose instances interconnect three entity instances: a starting airport, an ending airport, and the itinerary. For the correct interpretation of n-ary relationship types, you need to define the roles of the entity types within the relationship types.

In case of our sample ternary relationship type, you need to specify that the first airport is the starting airport for the leg of the itinerary and the second airport the ending airport of that leg. By doing this, you implicitly define a relationship between the two airports, namely, that they are the starting and ending airports for a nonstop connection. This is the relationship type explicitly implemented in the entity-relationship model portion of the visual. It more clearly expresses the actual situation.

Binary relationship types are sufficient if relationship types are allowed as source or target. By using only binary relationship types as we have defined them, the application domain is much better structured and hidden relationship types are revealed. Furthermore, using only binary relationship types avoids violations of the Fourth Normal Form and the Fifth Normal Form.



Student Notebook

Figure 4-18. Relationship Type Versus Attribute CF182.0

Notes:

It is not always clear whether a business relationship type must be modeled as a relationship type or just constitutes an attribute of an entity type. This is illustrated on the visual by means of business relationship type AIRCRAFT - MAINTENANCE RECORDS for our sample airline company.

The fact that there is a business relationship type seems to indicate that the entity-relationship model should include a relationship type AIRCRAFT_has_MAINTENANCE RECORD interconnecting entity types AIRCRAFT and MAINTENANCE RECORD.

However, the description of the business relationship type states that the maintenance records for an aircraft contain the serial number for the aircraft (aircraft number). This seems to indicate that the aircraft number should be an attribute of entity type MAINTENANCE RECORD. But, do not be fooled! The before-mentioned text only expresses that the aircraft number is displayed with a maintenance record (e.g., in the maintenance-record form on paper or in a window on a screen). It does not describe how the associations between aircraft and maintenance records are internally stored in a

Relationship Type Versus Attribute

MAINTENANCE RECORD

. . .

Aircraft Number

Maintenance NumberK

AIRCRAFT - MAINTENANCE RECORDS

As an aircraft is serviced, a maintenance record for the aircraft is established. A maintenance record applies to one and only one aircraft. For an aircraft, there may be multiple maintenance records.

The maintenance records for an aircraft contain the serial number for the aircraft. All maintenance records for an aircraft must be kept for the time the aircraft is owned by CAB and for two years thereafter. This implies that the maintenance records must still be kept after the remaining information for the aircraft has been deleted.

Relationships require that source and target instances exist

Aircraft Number must be an attribute

Cannot be expressed by a relationship type

Not guaranteed here

AIRCRAFT MAINTENANCE RECORD

_has_




Uempty
database. For a relational database management system, they would not be stored as part of the maintenance records if a maintenance record belonged to multiple aircraft. (In case of our sample airline company, it can only belong to one aircraft.)
Moreover, the entity-relationship model is part of the conceptual view during which only the conceptual interrelationships, and not any physical implementations, should be considered. Accordingly, the fact that a maintenance record contains the aircraft number rather expresses the relationship between maintenance records and aircraft (the inverse direction of relationship type AIRCRAFT_has_MAINTENANCE RECORD).

Well, we must disappoint you in this case ... Unfortunately, the description of the business relationship type includes the remark that the maintenance records (including the aircraft number) must be kept even after the remaining information for the aircraft has been deleted. This means that the association with the aircraft must be maintained.

This requirement prevents modeling the business relationship type between aircraft and maintenance records as a relationship type in the entity-relationship model since relationship instances, at all times, require the existence of their source and target instances. A relationship instance (automatically) disappears when its source or target instance is deleted.

Thus, the considered business relationship type cannot be expressed as a relationship type. It must be expressed as an attribute of entity type MAINTENANCE RECORD.

If such an anomaly, as exemplified by the considered business relationship type, does not exist, an association between two entity types should always be expressed as a relationship type in the entity-relationship model regardless of any future implementation considerations.



Student Notebook

Figure 4-19. Relationship Types for CAB CF182.0

Notes:

This visual contains the relationship types for our sample airline company called Come Aboard. The relationship types were derived from the problem statement contained in Appendix A - Sample Problem Statement. Since this is a fairly good problem statement, the relationship types could easily be derived from the business relationship types described by the problem statement. However, later in this unit, we will see that some additional relationship types will have to be added.

Note that there is no relationship type between MAINTENANCE RECORD and AIRCRAFT as discussed before.

Relationship Types for CAB

_belongs_to_

_from_

_scheduled_for_

_trained_for_ _can_land

_at_

_for_

_can_fly_

_nonstop_to_

_captain_for_

_copilot_for_

_in_

_for_

_for_

_for_

PILOTMECHANIC

ITINERARY FLIGHT

AIRPORT

AIRCRAFT

AIRCRAFT MODEL

MAINTENANCE RECORD




Uempty

Figure 4-20. Cardinalities CF182.0

Notes:

For modeling purposes and for the transformation of an entity-relationship model into tuple types and tables, it is important to know if an instance of the source of a relationship type can have relationships with multiple target instances, or vice versa, or only with a single target or source instance. It is also important to know if a source or target instance must always be connected to at least one target or source instance, respectively.

Since the relationship types of the entity-relationship model mostly correspond to the business relationship types of the problem statement, the multiplicities for the relationship types should be reflected by the descriptions of the corresponding business relationship types in the problem statement. The multiplicities were required input for the business relationship types. Since they are application-domain specific, the database designer should not make assumptions about them on his/her own. He/She should consult the domain expert or the appropriate department of expertise to obtain the correct information.

In the entity-relationship model, the multiplicities are expressed by cardinalities:

Cardinalities

_for_

(_has_been_assigned_)FLIGHTAIRCRAFT

At most one aircraft can be assigned to a flight

An aircraft may be (used) for many flights

Possible cardinalities: 0. .1 0. .m 1. .1 1. .m

1 m

1. .1 0. .m

0. .1 0. .m

An aircraft belongs to one and only one aircraft model

An aircraft model may be for many aircraft

_for_

(_belongs_to_)AIRCRAFT

MODEL AIRCRAFT



Student Notebook

• The cardinality for the target describes how many target instances may be associated with a single source instance and is placed close to the connecting arrow at the target (end) of the relationship type.

• The cardinality for the source expresses how many source instances may be associated with a single target instance and is placed close to the connection arrow at the source (end) of the relationship type.

• A cardinality consists of two values, a minimum value and a maximum value separated by two periods:

minimum .. maximum

Minimum can be 0 (zero) or 1. Maximum can be 1 or m where m is used as abbreviation for many.

A minimum of 0 for the cardinality of the target (source) means that a source (target) instance may not necess arily have a relationship with a target (source) instance.

A minimum of 1 for the cardinality of the target (source) means that a source (target) instance must always have at least one relationship with the instances of the target (source).

A maximum of 1 for the cardinality of the target (source) means that a source (target) instance cannot have more than one relationship with the instances of the target (source).

A maximum of m for the cardinality of the target (source) means that a source (target) instance can have many relationships with the instances of the target (source).

The upper relationship type on the visual interconnects aircraft models and aircraft. For the corresponding business relationship type, the problem statement states the following:

• An aircraft belongs to one and only one aircraft model. • An aircraft model may apply to multiple aircraft.

The fact that an aircraft belongs to one and only one aircraft model is expressed by a cardinality of 1..1 at the AIRCRAFT MODEL end of the relationship type since it describes the cardinality for the source. The fact that an aircraft model may apply to multiple aircraft is expressed by a cardinality of 0..m at the AIRCRAFT end of the relationship type. The minimum value of zero allows for aircraft models for which there is no aircraft.

The lower part of the visual illustrates the cardinalities for relationship type AIRCRAFT_for_FLIGHT for our sample airline company. According to the problem statement, an aircraft may be used for many flights resulting in a target cardinality of 0..m. Note that aircraft need not be assigned to flights at all times. According to the problem statement, at most one aircraft can be assigned to a flight, but there need not be an aircraft assigned to a flight. This results in a cardinality of 0..1 for the source of the relationship type.

Because of the minimum and maximum values they can assume, the possible cardinalities are:




Uempty
0..1 = at most one 0..m = any number 1..1 = one and only one 1..m = one or more
Since 0..1 and 0..m are the most common cardinalities, 1 and m can be used as abbreviations for them.

Relationship types with cardinalities ..m (meaning 0..m or 1..m) at both ends are referred to as m:m relationship types (m to m).

Relationship types with cardinalities ..1 (meaning 0..1 or 1..1) at both ends are referred to as 1:1 relationship types (one to one).

Relationship types with cardinality ..m at one end and ..1 at the other end are referred to as 1:m relationship types (one to m).

A further classification of the relationship types is the following:

Relationship types with cardinality 1.. (meaning 1..1 or 1..m) at at least one end are referred to as mandatory relationship types (mandatory for the source or target or both). It is mandatory for the source if the target cardinality is 1..1 or 1..m and mandatory for the target if the source cardinality is 1..1 or 1..m.

Relationship types with cardinality 0.. (meaning 0..1 or 0..m) at at least one end are referred to as optional or conditional relationship types (conditional for the source or target or both).



Student Notebook

Figure 4-21. Cardinalities (Example 1) CF182.0

Notes:

The above visual illustrates the cardinalities for relationship type MAINTENANCE RECORD_from_MECHANIC describing the interrelationships between maintenance records and mechanics for our sample airline company.

A maintenance record must be from at least one mechanic as indicated by the minimum value of 1 for the target cardinality (at the MECHANIC end of the relationship type). Accordingly, in the relationship instance diagram, there must be at least one connection from each maintenance record to a mechanic. The maximum value of 1 for the cardinality reflects that a maintenance record can be from at most one mechanic. Consequently, there must not be more than one connection from a maintenance record to mechanics.

The source cardinality of m is equivalent to a cardinality of 0..m. It specifies that a mechanic may be responsible for multiple (many) maintenance records, but need not be responsible for any:

• Mechanic 9163488 is responsible for a single maintenance record.

• Mechanic 0275912 is responsible for two, i.e., multiple, maintenance records.

Cardinalities (Example 1)

_from_MAINTENANCE RECORD MECHANIC

m 1. .1

Maintenance record from at most one mechanic

Maintenance record from at least one mechanic

Possibly many maintenance records

from a mechanic

MECHANIC

4712002. . .

Employee Number:. . .

. . .

MAINTENANCE RECORD

10386. . .

Maintenance Number:. . .

. . .

MAINTENANCE RECORD

10404. . .


MECHANIC

0275912. . .


_from_

_from_

. . .

MAINTENANCE RECORD

10385. . .


MECHANIC

9163488. . .


_from_




Uempty
• Mechanic 4712002 (currently) is not responsible for any maintenance records.


Student Notebook

Figure 4-22. Cardinalities (Example 2) CF182.0

Notes:

On the above visual, AIRPORT_nonstop_to_AIRPORT is a m:m relationship type: An airport can be the airport of arrival or the airport of departure for any number of nonstop connections (legs).

According to the problem statement for Come Aboard, an itinerary must always have at least one leg and can have multiple legs. Thus, the source cardinality must be 1..m for the _in_ relationship type. Note the way the cardinality is written on the visual to save space.

The target cardinality for the _in_ relationship type is m meaning that the legs may be part of multiple itineraries, but need not belong to any itineraries. The question is justified if a leg must not always belong to at least one itinerary resulting in a target cardinality of 1..m rather than 0..m? Why have a nonstop connection otherwise? The problem statement for our airline company is not precise in this regard and we must consult the domain expert for the correct cardinality. His/Her answer is that CAB wants to record planned nonstop connections between airports even before itineraries are established. This means that the cardinality of m is correct.

Cardinalities (Example 2)

_nonstop_to_

AIRPORT_in_

ITINERARY

m

m

m

m

1. .

A leg may occur in zero, one, or many itineraries

An itinerary can have many legs (nonstop portions)

An itinerary has at least one leg (nonstop portion)




Uempty
Relationship types with cardinalities of 1.. at both ends represent a kind of "chicken and egg" problem when adding instances for the source or target. If an instance for the source is added, the corresponding target instance, if it does not exist yet, and the interconnecting relationship instance must be added at the same time. A similar scenario applies to adding a target instance. The transaction concept of relational database management systems allows this provided that the completeness check is performed at the end of the transaction, i.e., when the transaction is committed, and not when the source or target instance are inserted.
To avoid the problem from the beginning, it may be preferable to change one of the minimum cardinalities to 0. In case of our example, this allows the legs to be established (first) without a check being performed. However, you should note that this is an implementation problem and not a conceptual design problem. Therefore, you should use 1.. cardinalities at both ends, if that is what the application domain requires, and handle the resulting problem during the later design phases.



Student Notebook

Figure 4-23. Defining Attributes and Relationship Key CF182.0

Notes:

To fully describe a relationship instance, you must specify the source and target instances interconnected by the relationship instance. The source and target instances can be identified by means of the values of their keys. If the source and target of the relationship type are entity types, the keys are the respective entity keys. We will see in a moment what the key is if the source or the target is a relationship type.

Since the keys of source and target completely describe and define the possible relationship instances, they are referred to as defining attributes of the relationship type. The defining attributes of a relationship type are completely independent of the cardinalities for the relationship type.

Similar to the introduction of the term entity key, the term relationship key is introduced to denote a subset of the defining attributes of a relationship type that can be used to uniquely identify the potential relationship instances and does not contain any defining attributes not needed for the unique identification (minimum principle).

It depends on the cardinalities for the relationship type which of the defining attributes can form the relationship key:

Defining Attributes and Relationship Key

Defining Attributesof a Relationship Type

=Key of . . .

AND

OR

AND

Relationship Key=

Key of . . .

Source Target

Source

Target

. .1

. .m

SourceTarget . .m . .m

Source

Target

. .1

. .m

. .1 . .1

Independent of

cardinalities




Uempty
• If both cardinalities of the relationship type are ..m cardinalities (i.e., 0..m or 1..m), each source instance can be associated with multiple target instances and each target instance with multiple source instances. Thus, each source or target key value may occur as defining attribute of multiple relationship instances.
Consequently, a relationship instance can only be uniquely identified (referred to) by providing both the key value for the source and the target. Accordingly, the relationship key for the relationship type consists of the key of the source and the key of the target.

• If the cardinality of the source is ..1 and the cardinality of the target is ..m, there may be multiple target instances for each source instance, but there may be only one source instance for any target instance. Thus, a source key value may occur as defining attribute of multiple relationship instances whereas a target key value can only occur as defining attribute of a single relationship instance.

Consequently, a relationship instance can uniquely be identified by providing the value of its target defining attribute, i.e., the key value of its target instance. In other words, the relationship key consists of (the attributes of) the key of the target of the relationship type.

• Similarly, if the cardinality of the target is ..1 and the cardinality of the source is ..m, there may be multiple source instances for each target instance, but there may be only one target instance for any source instance. Thus, the target key value may occur as defining attribute of multiple relationship instances whereas a source key value can only occur as defining attribute of a single relationship instance.

Consequently, a relationship instance can uniquely be identified by providing the value of its source defining attribute, i.e., the key value of its source instance. In other words, the relationship key consists of (the attributes of) the key of the source of the relationship type.

• If both cardinalities are ..1, for every source instance there may only be one target instance and vice versa. Thus, each source or target key value may occur once as defining attribute of a relationship instance. A relationship instance can be uniquely identified by providing the key value of its source instance or the key value of its target instance. Only one is required.

Accordingly, you can choose as relationship key either the key of the source or the key of the target of the relationship type, but not both (minimum principle).



Student Notebook

Figure 4-24. Relationship Key (Example 1) CF182.0

Notes:

The visual shows relationship type AIRCRAFT MODEL_for_AIRCRAFT, a 1:m relationship type since there may be multiple aircraft for each aircraft model, but one and only one aircraft model for each aircraft.

The defining attributes for the relationship type are the keys of the source and the target, i.e., Type Code and Model Number from AIRCRAFT MODEL and Aircraft Number from AIRCRAFT.

Since there is only one aircraft model for each aircraft, Aircraft Number, i.e., the key of entity type AIRCRAFT, becomes the relationship key.

Relationship Key (Example 1)

m1. .1

_for_AIRCRAFT MODEL

. . .


AIRCRAFT

. . .Aircraft NumberK

Defining AttributesType Code

Model Number

Aircraft Number

Relationship KeyAircraft Number




Uempty

Figure 4-25. Relationship Key (Example 2) CF182.0

Notes:

If we want to determine the defining attributes or the relationship key of relationship type AIRPORT_nonstop_to_AIRPORT_in_ITINERARY, we first need to find the relationship key of relationship type AIRPORT_nonstop_to_AIRPORT. Its source and target are entity types so that we can immediately derive its defining attributes and relationship key. The defining attributes are twice Airport Code, once playing the role of the airport of departure (From) and once the role of the airport of arrival (To).

To make this apparent, you can (and should) indicate the respective roles at the appropriate ends of the relationship type. The defining attributes for the relationship type should be named accordingly. As done on the visual, you should add, in parentheses, the original name of the attributes since the roles only act as synonyms for them.

AIRPORT_nonstop_to_AIRPORT is a m:m relationship type. Therefore, the relationship key consists of all defining attributes.

After having determined the relationship key of AIRPORT_nonstop_to_AIRPORT, we also know the defining attributes of relationship type AIRPORT_nonstop_to_AIRPORT_in_ITINERARY. They consist of the key of the target

Relationship Key (Example 2)

m

m

m

m

1. .

_nonstop_to_

AIRPORT

. . .Airport CodeK

_in_ ITINERARY

. . .Flight NumberK

Defining Attributes:

From (Airport Code)To (Airport Code)

Relationship Key:



Flight Number From (Airport Code)

To (Airport Code)

To

From

Relationship Key:

Flight Number From (Airport Code)

To (Airport Code)



Student Notebook

and the key of the source for the relationship type, i.e., of Flight Number (from ITINERARY) and From and To (from AIRPORT_nonstop_to_AIRPORT). The sequence of the attributes is not important.

Since AIRPORT_nonstop_to_AIRPORT_in_ITINERARY is a m:m relationship type, its relationship key consists of its defining attributes, i.e., Flight Number, From, and To.

When determining the defining attributes or the relationship key of a relationship type, you must back-step until you finally reach relationship types whose source and target are entity types. Start determining the defining attributes and the relationship keys from there. If the source or target of the _nonstop_to_ relationship type had been relationship types, you would have had back-step further to determine the defining attributes and the relationship key.




Uempty

Figure 4-26. Cardinalities for CAB CF182.0

Notes:

This visual contains the cardinalities for the relationship types for our sample airline company called Come Aboard. The cardinalities for the relationship types were derived from the description of the business relationship types contained in the problem statement in Appendix A - Sample Problem Statement.

Based on the cardinalities, the relationship types have the following relationship keys:

Relationship Type Relationship Key

AIRCRAFT MODEL_for_AIRCRAFT Aircraft Number AIRPORT_nonstop_to_AIRPORT From (Airport Code), To (Airport Code) AIRPORT_nonstop_to_AIRPORT_in_ ITINERARY

Flight Number, From (Airport Code), To (Airport Code)

AIRCRAFT_for_FLIGHT Flight Number, From (Airport Code), To (Airport Code), Flight Locator

MAINTENANCE RECORD_from_MECHANIC Maintenance Number

Cardinalities for CAB

m

m

m m

Owner

m 1

m

m m

m

_belongs_to_

_from_

_scheduled_for_

_trained_for_ _can_land

_at_

_for_

_can_fly_

_nonstop_to_

_captain_for_

_copilot_for_

_in_

_for_

_for_

_for_

PILOTMECHANIC

ITINERARY FLIGHT

AIRPORT

AIRCRAFT

AIRCRAFT MODEL

MAINTENANCE RECORD

m

1

m

1

m

1. .1

1. .1

m

1. .1

m

m

1. .1 From To

m m

m

1. .m

1

m



Student Notebook

AIRCRAFT MODEL_can_land_at_AIRPORT Type Code, Model Number, Airport Code

PILOT_can_fly_AIRCRAFT MODEL Employee Number, Type Code, Model Number

PILOT_captain_for_FLIGHT Flight Number, From (Airport Code), To (Airport Code), Flight Locator

PILOT_copilot_for_FLIGHT Flight Number, From (Airport Code), To (Airport Code), Flight Locator

AIRCRAFT MODEL_for_AIRPORT_nonstop_ to_AIRPORT_in_ITINERARY

Flight Number, From (Airport Code), To (Airport Code)

AIRPORT_nonstop_to_AIRPORT_in_ ITINERARY_for_FLIGHT

Flight Number, From (Airport Code), To (Airport Code), Flight Locator

MECHANIC_trained_for_AIRCRAFT MODEL Employee Number, Type Code, Model Number

MECHANIC_scheduled_for_AIRCRAFT Employee Number, Aircraft Number MAINTENANCE RECORD_belongs_to_ MAINTENANCE RECORD

Maintenance Number

Relationship Type Relationship Key




Uempty
4.3 Dependent Entity Types, Supertypes, and Subtypes


Student Notebook

Figure 4-27. A First Correction of the CAB Model CF182.0

Notes:

A closer look at entity type AIRCRAFT MODEL for our sample airline company reveals that it contains some attributes that are not really aircraft model specific, but rather aircraft type specific. For example, Category (JET, TURBOPROB, etc.), Manufacturer, and Number of Engines are only dependent on Type Code, i.e, the type of the aircraft (e.g., B747), and not on the specific model. They are the same for all models of the same type.

This leads to the conclusion that business object type Aircraft Models as described by the problem statement is rather a combination of two entity types, namely, of entity types AIRCRAFT TYPE and AIRCRAFT MODEL as illustrated on the right-hand side of the visual. AIRCRAFT TYPE only contains the type-specific attributes and AIRCRAFT MODEL the model-specific attributes (e.g., Dimensions consisting of Length, Height, and Wing Span) that may be different for the models of a type.

The entity key of AIRCRAFT TYPE is Type Code. For AIRCRAFT MODEL, it consists of Type Code and Model Number as before since Model Number alone is not unique.

Of course, entity types AIRCRAFT TYPE and AIRCRAFT MODEL are interconnected by a relationship type:

A First Correction of the CAB Model

However, this is a very special relationship type

1. .1

1. .m

_for_

AIRCRAFT MODEL

Not model specific, only type specificNumber of Engines

CategoryManufacturer


DimensionsLengthHeightWing Span

. . .

AIRCRAFT TYPE

ManufacturerNumber of Engines

Type CodeKCategory

. . .

AIRCRAFT MODELType CodeKModel NumberKDimensions

LengthHeightWing Span

. . .




Uempty
AIRCRAFT TYPE_for_AIRCRAFT MODEL
An aircraft model belongs to one and only one aircraft type. For an aircraft type, there may be many different aircraft models. The cardinality of 1..m for AIRCRAFT MODEL assumes that CAB keeps the information about an aircraft type only if it keeps at least one aircraft model for that type.

Splitting an entity type into two entity types as we have done here requires a reevaluation of the relationship types for which the former, combined, entity type was the source or target. For each relationship type, we must determine which of the new entity types will be its source or target.

Furthermore, the problem statement should be updated by the domain expert (in cooperation with the database designer) to reflect the application-domain aspects of the split entity types and of the new relationship type.

A closer look at the new relationship type reveals that an aircraft model cannot be connected to an arbitrary aircraft type. It can only be connected to specific aircraft types as illustrated on the next visual.



Student Notebook

Figure 4-28. Dependent Entity Types CF182.0

Notes:

An aircraft model cannot be connected to an arbitrary aircraft type. It can only be associated with the aircraft type having the same type code as the aircraft model: A Boeing 747, Model 400 (Type Code = B747, Model Number = 400) is a Boeing 747 (type) and, therefore, can only be associated with the instance of AIRCRAFT TYPE having the entity key value B747.

An entity type being dependent on another entity type or on a relationship type in such a way that

• a part of its entity key or its full key is the key of the other entity type or the key of the relationship type

• each of its instances must be connected to, and only to, the entity instance or relationship instance with the matching key value

is referred to as dependent entity type. In the entity-relationship model, the dependent entity type is identified by the letter D at its end of the relationship type establishing the dependency.

Dependent Entity Types

AIRCRAFT MODEL

400Type Code:Model Number:

AIRCRAFT MODEL


AIRCRAFT TYPE

Type Code:

_for_

_for_

B747

B747

B747Must be

equal

AIRCRAFT MODEL


AIRCRAFT MODEL


AIRCRAFT TYPE

Type Code:

_for_

_for_

A310

A310

A310Must be

equal

1. .1

_for_AIRCRAFT TYPE

AIRCRAFT MODEL1. .m

No cardinality in diagram since always 1. .1

D




Uempty
Because of this key interdependency, each dependent entity instance must belong to one and only one parent instance. Thus, the cardinality at the parent end of the relationship type establishing the dependency must always be 1..1 and is omitted to simplify the diagram.


Student Notebook

Figure 4-29. Dependent Entity Types - Characteristics CF182.0

Notes:

As described before, a dependent entity type is an entity type fulfilling the following conditions:

• A part of its key or its entire key is equal to the key of another entity type or of a relationship type. This entity type or relationship type is referred to as parent entity type or parent relationship type, respectively.

• There must exist a relationship type between the parent entity type or relationship type and the dependent entity type with the following characteristics:

- Each instance of the dependent entity type is, at all times, connected to one and only one parent instance.

- The dependent and parent instances interconnected are those with matching values: The value of the appropriate key portion of the dependent entity instance must be equal to the key value of the parent instance.

The relationship type interconnecting the parent entity type or relationship type and the dependent entity type is referred to as owning relationship type.

Dependent Entity Types - Characteristics

Parent Entity Type

Dependent Entity Type

AIRCRAFT MODEL

_for_

AIRCRAFT TYPE

D 1. .m

Owning Relationship

Type

The source can be an entity type or a relationship type

The target must be an entity type

An entity type must not be dependent on more than one source

Otherwise, relationship type missing between parentsDependence should be on that relationship type

The key of the source must be part of the key of the target

Only instances with matching values are associated with each other

No cardinality since always 1. .1

For every target instance, there must be a source instance

Defining attributes and relationship key for owning relationship type:

Key of dependent entity type




Uempty
An entity type must not be dependent on more than one entity type or relationship type. Should you see the need for a dependency on two parents, a relationship type between the parents is missing and should be established. Should you see a dependency on more than two parents, multiple interrelated relationship types are missing, and the dependent entity type is to be based on the last of them.
As discussed before, the defining attributes of a relationship type are the keys of its source and target. However, because of the matching values of the key/key portion, the key of the dependent entity type is sufficient to completely describe the owning relationship type. Therefore, the key of the parent is omitted.

As a consequence of the implied 1..1 cardinality at the parent end, the key of the dependent entity type is also the key of the owning relationship type.



Student Notebook

Figure 4-30. Nondefining Attributes for Relationship Types CF182.0

Notes:

Besides the defining attributes, relationship types may have additional attributes further characterizing them. These attributes are referred to as nondefining attributes of the relationship type.

When talking about the nonstop connections for an itinerary, we observed that the legs of an itinerary must be ordered. Each instance of relationship type _in_ (AIRPORT_nonstop_to_AIRPORT_in_ITINERARY) having as target the considered itinerary represents a leg of that itinerary. Its defining attributes specify the flight number for the itinerary and the nonstop connection (starting and ending airports) for the leg.

To order the legs, it is necessary to assign an attribute (Leg Number) to relationship type _in_ by means of which the sequence of the legs for the itinerary can be established. The attribute cannot simply be added to entity type ITINERARY since, in this case, the itineraries were ordered without considering the legs. The attribute must also not be an attribute for relationship type _nonstop_to_ (AIRPORT_nonstop_to_AIRPORT) since, in this case, the nonstop connections were ordered without consideration for the itineraries. The order for a leg, however, depends on both the itinerary and the nonstop connection for

Nondefining Attributes for Relationship Types

m

m

1. .

_in_ ITINERARY

. . .Flight NumberK

m

m

_nonstop_to_

AIRPORT

. . .Airport CodeK

To

From

1. .1

LEG

Leg Number

_as_

D

Flight NumberKFromKToK

Relationship Key:


Relationship Key:


Flight Number




Uempty
the leg. Consequently, Leg Number must be an attribute for relationship type _in_ (AIRPORT_nonstop_to_AIRPORT_in_ITINERARY).
Nondefining attributes are assigned to a relationship type by basing a dependent entity type on the relationship type containing the attributes and the relationship key. Thus, to each instance of the relationship type, zero, one, or more instances of the dependent entity type are attached. The cardinality for the dependent entity type determines how many instances of the dependent entity type can and must be attached to a relationship instance.

In case of the example on the visual, a dependent entity type is based on relationship type _in_ containing nonkey attribute Leg Number. For each relationship instance, i.e., each leg of an itinerary, it specifies the sequence number (leg number) for the appropriate nonstop connection for the itinerary. Since the dependent entity type further describes the legs of the itinerary, we have called it LEG in the above visual. The abbreviated name of the owning relationship type is _as_. According to our naming convention, its full name is:

AIRPORT_nonstop_to_AIRPORT_in_ITINERARY_as_LEG

Because each leg only receives a single leg number, the cardinality for the dependent entity type must be 0..1 or 1..1. Which of these it is depends on how you want to model it. If you also want to assign a leg number to the only leg of a one-leg itinerary, the cardinality must be 1..1. If you only want to sequence the legs of multi-leg itineraries, the cardinality should be 0..1. We have chosen the first alternative because it treats itineraries more uniformly, prevents that legs of a multi-leg itinerary are not sequenced, and tends to be more general.

In addition to the leg number, the dependent entity type contains the key of the parent relationship type, i.e., the attributes Flight Number (coming from ITINERARY), From, and To (both from AIRPORT_nonstop_to_AIRPORT).

Because of the maximum cardinality of 1, the key of the (parent) relationship type becomes the key of the dependent relationship type.

In addition to the key attributes, the dependent entity type may contain any number of (nondefining) attributes for the relationship type (e.g., the planned departure and arrival Helvetica for the leg) as long as the maximum value of the cardinality for the dependent entity type is observed. However, dependent entity types should follow the rules for entity types we have established before. In particular, the dependent entity type should have a sensible meaning for the relationship type (and application domain) and its attributes should all support that meaning. The dependent entity type should not be a garbage collection. If necessary, introduce multiple dependent entity types for the relationship type each having a well-defined meaning.

You also may want to introduce multiple dependent entity types for the relationship type if many of the nondefining attributes are optional. In this case, you might prefer to have a dependent entity type with cardinality 1..1 for the mandatory attributes and one or more others with cardinality 0..1 for the optional attributes. All of the attributes of such an entity type should be optional at the same time: If one of the attributes does not apply, the others



Student Notebook

do not apply either. The resulting dependent entity types should again have a well-defined meaning for the relationship type whose nondefining attributes they contain.

For attributes requiring a different maximum cardinality, you need different dependent entity types (possibly multiple ones in accordance with the discussions above).




Uempty

Figure 4-31. Nondefining Attributes - Sample Diagram CF182.0

Notes:

The above instance diagram illustrates dependent entity type LEG, containing the nondefining attributes for relationship type AIRPORT_nonstop_to_AIRPORT_in_ITINERARY (= _in_), for a sample itinerary. Because of cardinality 1..1 for dependent entity type LEG, there is only one dependent entity instance for each instance of relationship type _in_.

The dependent entity instance contains the key of its parent relationship instance and the assigned leg number. The nonstop connection from Atlanta (ATL) to Stuttgart (STR) is the first leg (Leg Number = 1) for itinerary YY3367. The nonstop connection from Stuttgart to Frankfurt (FRA) is the second leg (Leg Number = 2) of the itinerary.

Nondefining Attributes - Sample Diagram

LEG

Flight Number:From:To:Leg Number:

YY3367ATLSTR1

_as_

LEG

Flight Number:From:To:Leg Number:

YY3367STRFRA2

_as_

_nonstop_to_

ITINERARY

Flight Number:. . .

YY3367. . .

AIRPORT

Airport Code:. . .

STR. . .

AIRPORT

Airport Code:. . .

FRA. . .

AIRPORT

Airport Code:. . .

ATL. . .

_nonstop_to_

_in_

_in_



Student Notebook

Figure 4-32. Attributes for a Sample Relationship Type CF182.0

Notes:

Using (nondefining) attributes, i.e., a dependent entity type, you can replace the two relationship types PILOT_captain_for_FLIGHT and PILOT_copilot_for_FLIGHT by a single relationship type PILOT_assigned_to_FLIGHT as illustrated in the lower part of the above visual. Each instance of the new relationship type has associated with it an instance of dependent entity type PILOT ASSIGNMENT specifying the function (CAPTAIN or COPILOT) for the selected pilot on the selected flight.

Since a pilot can be assigned to multiple flights and multiple pilots can be assigned to a flight, relationship type PILOT_assigned_to_FLIGHT is a m:m relationship type. Accordingly, its key consists of the keys for PILOT and FLIGHT.

Since the cardinality for PILOT ASSIGNMENT is 1..1 (a pilot assigned to a flight has one and only one function for that flight), no additional attributes are needed to achieve uniqueness of the entity instances and the key of the parent relationship type becomes the entity key of the dependent entity type.

Both approaches have advantages and disadvantages. The first approach of using two relationship types ensures that not more than two pilots are assigned to a flight and not

Attributes for a Sample Relationship Type

_assigned _to_ FLIGHTFlight NumberKFromKToKFlight LocatorK. . .

PILOTEmployee NumberK. . . mm _by_

1. .1D

PILOT ASSIGNMENTFlight NumberKFromKToKFlight LocatorKEmployee NumberKPilot Function

_captain_for_1 m

_copilot_for_1 m

PILOT FLIGHT




Uempty
more than one pilot as captain or copilot, respectively. However, without additional constraints, it does not prevent a pilot from being assigned as captain and copilot to the same flight. (Constraints are discussed later in this unit.)
The second approach, using a single relationship type, does not prevent the assignment of multiple captains or copilots to a flight without additional constraint. It also does not prevent that more than two pilots are assigned to a flight. However, because of the uniqueness requirement for the entity key, it ensures that a pilot only assumes one role for a flight.

The second solution is more flexible and open-ended. By removing the appropriate constraints and allowing additional values for attribute Pilot Function, it enables Come Aboard to introduce substitute captains and copilot (i.e., standbys for pilots that fall sick) or to assign multiple captains or copilots to long flights for which the maximum flying period for pilots were exceeded. However, before introducing these new functions, they must be discussed with and approved by the domain expert or the appropriate department of expertise. In case of multiple captains and copilots for long flights, you can easily think of additional attributes for dependent entity type PILOT ASSIGNMENT: for example, the time in the flight when a pilot is captain or copilot.



Student Notebook

Figure 4-33. Relationships on Owning Relationship Type CF182.0

Notes:

It is conceivable that an owning relationship type is the source or target of another relationship type. However, in this case, you can base the other relationship type on the dependent entity type rather than on the owning relationship type as explained in the following.

For simplicity, let us assume that the owning relationship type is the source of the second relationship type. As explained before, the key of the owning relationship type is the key of the dependent entity type. Therefore, the defining attributes for the second relationship type are the key of the dependent entity type and the key of the target.

If the second relationship type had as source the dependent entity type, its defining attributes would also be the key of the dependent entity type and the key of the target. This means that the potential relationship instance are the same in both cases and that the two implementations of the second relationship type are equivalent. Consequently, you can base the second relationship type on the dependent entity type rather than on the owning relationship type simplifying the entity-relationship model.

Relationships on Owning Relationship Type

Relationship Key:

Key of Dependent Entity Type

Target

Target


Parent

D



Key of Target



Key of Target




Uempty

Figure 4-34. Controlling Property CF182.0

Notes:

The controlling property can be specified for the source or the target of a relationship type or for both. In the entity-relationship model, it is indicated by the letter C at the end of the relationship type to which it applies.

If you specify the controlling property for the source (target), the source (target) instance belonging to a relationship instance is to be deleted when the relationship instance is deleted.

As a modeling construct, the controlling property can only describe what should happen if a relationship instance is deleted. Nevertheless, when talking about an example, people often say: If this relationship instance is deleted, then this source (or target) instance is deleted. This means that they talk about the effects of the controlling property when it is implemented.

The above visual illustrates the controlling property for relationship type MAINTENANCE RECORD_belongs_to_MAINTENANCE RECORD. The problem statement for our sample airline company Come Aboard specifies that a maintenance record should be deleted if its owning maintenance record is deleted. CAB's maintenance

MAINTENANCE RECORD _belongs_to_

1

m

Owner

Controlling Property

MAINTENANCE RECORD


004711. . .

MAINTENANCE RECORD


004712. . .

MAINTENANCE RECORD


004714. . .

MAINTENANCE RECORD


004713. . .

_belongs_to_

_belongs_to_

_belongs_to_

C

Deletion of Relationship Instance

Deletion of Controlled Instance



Student Notebook

records are hierarchically structured. A maintenance record can belong to another (one) maintenance record, the owning maintenance record, and can have multiple subrecords.

The implied deletion of the subrecords is modeled by specifying the controlling property for the subrecord end (the source) of relationship type MAINTENANCE RECORD_belongs_to_MAINTENANCE RECORD. As indicated on the visual, the controlling property implies that, as a result of the deletion of the relationship instance connecting maintenance record 004712 to maintenance record 004711, maintenance record 004712 is to be deleted.

The deletion of a maintenance record implies the deletion of all relationship instances having the deleted maintenance record as their target. Consequently, the controlling property for the source of the relationship type implies that all subrecords of a maintenance record are to be deleted if the maintenance record is deleted. Thus, if maintenance record 004711 is deleted, maintenance records 004712, 004713, and 004714 should be deleted as well.




Uempty

Figure 4-35. Cascading Effect CF182.0

Notes:

The controlling property may have a cascading effect: One deletion may "cause" many others. This is especially true for unary relationship types as illustrated on the above visual:

• The maintenance record originally being deleted is the record with maintenance number 004711 ( 1 ).

• The deletion of maintenance record 004711 causes the deletion of the relationship instances associating maintenance record 004711 with maintenance records 004721 and 004722 ( 2 ) because their target instance is deleted.

• The deletion of the two relationship instances, in conjunction with the controlling property, implies that maintenance records 004721 and 004722 are to be deleted ( 3 ).

• Since maintenance record 004722 was the target of the relationship instance connecting maintenance record 004801 to it, the relationship instance is deleted as well ( 4 ).

• Together with the controlling property, the deletion of the relationship instance interconnecting maintenance records 004722 and 004801 implies that maintenance record 004801 is to be deleted ( 5 ).

Cascading Effect

C

MAINTENANCE RECORD _belongs_to_

1

m

Owner

MAINTENANCE RECORD


004802. . .

MAINTENANCE RECORD


004721. . .

MAINTENANCE RECORD


002907. . .

MAINTENANCE RECORD


004801. . .

MAINTENANCE RECORD


004711. . .

MAINTENANCE RECORD


004722. . .

_belongs_to_

_belongs_to_

_belongs_to_

_belongs_to_

4

1

2

2

3

3

5 76



Student Notebook

• The deletion of maintenance record 004801 causes the deletion of the relationship instance interconnecting maintenance records 004801 and 004802 because the target of the relationship instance has been deleted ( 6 ).

• Finally, the deletion of the relationship instance implies that maintenance record 004802 is to be deleted ( 7 ).

Thus, due to the controlling property, the deletion of a single maintenance record implies the deletion of all maintenance records except maintenance record 002907. Maintenance record 002907 is not interconnected to any of the deleted maintenance records.

The example illustrates very clearly that you must be careful when using the controlling property and must understand its explicit and implicit effects. If the cascading effect is what the application domain wants to achieve (as is the case for the maintenance records), the usage of the controlling property is perfectly all right.




Uempty

Figure 4-36. Controlling for Relationship Type Attributes CF182.0

Notes:

As we discussed before, the nondefining attributes for relationship types are modeled by means of dependent entity types. When a relationship instance is deleted, the dependent entity instance or instances containing the nondefining attributes (values) for the relationship instance must be deleted as well. This can be achieved by means of the controlling property for the dependent entity type.

The visual illustrates this for dependent entity type LEG. If a nonstop connection is removed from an itinerary, its leg number (a nondefining attribute for relationship type AIRPORT_nonstop_to_AIRPORT_in_ITINERARY) should be deleted as well as indicated by the controlling property for LEG.

Controlling for Relationship Type Attributes

m

m

1. .

_in_ ITINERARY

. . .Flight NumberK

m

m

_nonstop_to_

AIRPORT

. . .Airport CodeK

To

From

1. .1

_as_

D

LEG

Leg Number

Flight NumberKFromKToK

C



Student Notebook

Figure 4-37. A Second Correction of the CAB Model CF182.0

Notes:

If you scrutinize the attributes for entity types PILOT and MECHANIC for our sample airline company Come Aboard, you will realize that they have attributes (e.g., Last Name, First Name, Address, and Date of Birth) that are common to both of them. They also have attributes that are specific to the particular entity type: Date Last Checkup, Result Last Checkup, Date Next Checkup, and Last Flown On only apply to pilots; Area of Expertise, Type of Certification, Date of Certification, and Security Status only to mechanics.

The common attributes are not specific to pilots or mechanics. Rather, they are common to all employees. Pilots and mechanics are subcategories (or subtypes) of employees. As employees, the common attributes apply to them as well.

Since CAB does not want to distinguish the different types of employees when only processing the employee information, it makes sense to introduce another entity type, called EMPLOYEE, which functions as a supertype and contains the attributes common to all employees. The common attributes are removed from PILOT and MECHANIC so they only contain the attributes that are specific to pilots or mechanics, respectively.

A Second Correction of the CAB Model

PILOT

Date Last CheckupResult Last Checkup

Last Flown OnDate Next Checkup

. . .

Employee NumberKLast Name

Date of Birth

First NameAddress

. . .

MECHANICEmployee NumberKLast Name

Date of Birth

First NameAddress

. . .Area of ExpertiseType of Certification

Security StatusDate Certification

. . .

General Employee

Information

Pilot Specific

Information

Mechanic Specific

Information




Uempty
The introduction of entity type EMPLOYEE is illustrated on the next visual. It leads to supertypes and subtypes.


Student Notebook

Figure 4-38. Supertype and Subtypes CF182.0

Notes:

When categorizing items, you form classes and subclasses. The subclasses structure the elements of the classes. They do not contain different elements. Each member of a subclass also belongs to the (superior) class to which the subclass belongs.

In modeling, the items categorized are the instances of entity types. The superior class is called supertype entity type or supertype. The subclasses are referred to as subtype entity types or subtypes. A supertype may have one or more subtypes. The term class structure is used to denote the structure consisting of a supertype and its subtypes.

For each instance of a subtype, the supertype contains one, and only one, corresponding instance reflecting the fact that a member of a subclass, at the same time, is a member of the corresponding superior class. In the example on the visual, pilots and mechanics are, at the same time, employees. Therefore, for each instance of entity types PILOT or MECHANIC, there must be a corresponding instance in entity type EMPLOYEE.

A supertype can be considered as the (common) generalization of its subtypes. Conversely, the subtypes can be considered as specializations of the supertype. Therefore,

EMPLOYEEEmployee NumberKLast Name

Date of Birth

First NameAddress

. . .

PILOT

. . .

Result Last Checkup

Last Flown OnDate Next Checkup

Date Last CheckupEmployee NumberK

MECHANIC

Area of ExpertiseType of Certification

Security StatusDate Certification

. . .

Employee NumberK

Supertype and Subtypes

_is_

C C1D 1 D

Total Attributes for

Pilot

Total Attributes for

Mechanic

S




Uempty
the terms generalization and specialization are used in conjunction with supertypes and subtypes.
Whereas there must be a supertype instance for each subtype instance, there need not be a subtype instance for every supertype instance. This means that the specialization can be incomplete (partial). Come Aboard, for example, has employees (other than pilots or mechanics) who do not have specific attributes. For them, there is not a subtype.

The supertype/subtype concept implies that the total set of attributes for the conceptual object represented by a subtype instance consists of its subtype attributes and the attributes for the corresponding supertype instance. In our example, the total set of attributes for a pilot consists of his/her pilot-specific attributes (as represented by the PILOT instance) and his/her attributes as an employee (i.e., the attributes of the corresponding EMPLOYEE instance).

Processing-wise, you want both the attributes of the subtype instance and of the corresponding supertype instance when referring to a subtype instance. In contrast, when referring to a supertype instance, i.e., when processing the represented object in the quality expressed by the supertype, you only want the attributes of the supertype instance and not the attributes of any subtype instances associated with it.

When a supertype instance is deleted, any associated subtype instances must be deleted as well because the conceptual object associated with the supertype instance no longer exists. By themselves, a subtype instance and its corresponding supertype instance can be considered as partial instances. Together, they form the complete instance.

Logically, the supertype has a relationship type of the form

supertype_is_subtype

with each subtype (e.g., EMPLOYEE_is_PILOT and EMPLOYEE_is_MECHANIC). To indicate that these relationship types belong to the same class structure, they are combined to a fork whose handle starts at the supertype. (Note that an entity type may be structured in more than one way into subtypes making it necessary to group the relationship types for a class structure.)

In addition, the supertype is identified by the letter S next to it. Without this indication, the supertype could not be identified for class structures having just one subtype.

The set of _is_ relationship types interconnecting a supertype and its subtypes is referred to as is-bundle. All relationship types of the is-bundle have a supertype cardinality of 1..1 since there must be one and only one supertype instance for every subtype instance. Therefore, the supertype cardinality is omitted. It is considered implied by the letter S for the supertype.

For each supertype instance, a subtype may contain at most one instance. Consequently, the cardinality for the subtype of the _is_ relationship type must be ..1.

As entity types, the subtypes must have an entity key. The most natural choice is the entity key of the supertype. In this case, a subtype instance is always to be connected to the



Student Notebook

supertype instance with the matching key value. Accordingly, the subtype becomes a dependent entity type and is marked as such.

Since a subtype instance is to be deleted when its supertype instance is deleted, the controlling property applies to the subtypes.




Uempty

Figure 4-39. Bundle Cardinalities CF182.0

Notes:

As discussed before, for each subtype instance, there must be one, and only one, supertype instance. The reversal of this statement is not true. There need not necessarily be a subtype instance for a supertype instance. For a supertype instance, there may also be instances in multiple subtypes. However, a subtype may contain at most one instance for any supertype instance.

It is a characteristic property of a class structure if, for every supertype instance, at least one of the subtypes must contain a corresponding subtype instance.

It is another characteristic property of a class structure if, for a supertype instance, multiple subtypes can contain a corresponding subtype instance.

These two properties are controlled by the bundle cardinality. The bundle cardinality is a cardinality for the is-bundle rather than for the sources and targets of the relationship types it comprises. The bundle cardinality specifies how many relationship instances of the is bundle (is-relationship instances) a supertype instance must have at least and may have at most. In other words, it specifies how many prongs the fork must have at least and may have at most for a supertype instance.

Bundle Cardinalities

Exclusive. .1

1. . Covering

Employee may be a pilot or a mechanic, but not both

0. .1 1or

Employee must be a pilot or a mechanic, but not both

1. .1

Employee may be a pilot and/or a mechanic

0. .m mor

Employee must be a pilot and/or a mechanic

1. .m

C 1D C1 D

S

EMPLOYEE

PILOT MECHANIC

_is_

1



Student Notebook

In the entity-relationship model, the bundle cardinality is specified at the point of the fork for the is-bundle where the handle and the prongs meet. It can assume the following values:

0..1 or 1

A supertype instance may have at most one is-relationship instance. This means that a supertype instance need not have a corresponding subtype instance in any of the subtypes. It may have a corresponding instance in at most one subtype.

For the example on the visual, this would mean that an employee need not be a pilot or a mechanic. The employee can be a pilot or mechanic, but cannot be both.

1..1

Every supertype instance must have one and only one is-relationship instance. This means that a supertype instance must have a corresponding subtype instance in one and only one subtype.

For the example on the visual, this would mean that an employee must be a pilot or a mechanic, but cannot be both.

0..m or m

A supertype instance may have any number of is-relationship instances. This means that a supertype instance need not have a corresponding subtype instance in any of the subtypes. It may have corresponding instances (one each) in multiple subtypes.

For the example on the visual, this would mean that an employee need not be a pilot or a mechanic, but can be a pilot or mechanic and can be both.

1..m

A supertype instance must have one or more is-relationship instances. This means that a supertype instance must have corresponding subtype instances (one each) in at least one subtype. It may have corresponding instances in multiple subtypes.

For the example on the visual, this would mean that an employee must be a pilot or mechanic and can be both.

The bundle cardinality is only specified if the supertype has more than one subtype. In case of only a single subtype, the subtype cardinality is sufficient.

As always, the correct choice of the bundle cardinality depends on the application domain. The problem statement for our sample airline company implies that there are other employees than pilots and mechanics. Thus, the bundle cardinality can only be 0..1 or 0..m. Since the business constraints for Come Aboard state that a pilot cannot be a mechanic at the same time, the bundle cardinality must be 0..1 (= 1) for the illustrated example.

Bundle cardinalities of the form ..1 specify that a supertype instance may have subtype instances in at most one subtype. Therefore, the subtype set (the set of subtypes) is referred to as exclusive.




Uempty
Bundle cardinalities of the form 1.. specify that a supertype instance must have a corresponding subtype instance in at least one subtype. Therefore, the subtype set is referred to as covering.
You should ensure that the subtype cardinalities and the bundle cardinality are compatible. If at least one of the subtype cardinalities is 1..1 (meaning that, for each supertype instance, there must be a corresponding instance in this subtype), the bundle cardinality should be 1.. (1..1 or 1..m).



Student Notebook

Figure 4-40. An Alternate Maintenance Record Solution CF182.0

Notes:

When we discussed the business relationship type between aircraft and maintenance records before, we determined that the business relationship type cannot be expressed by a relationship type in the entity-relationship model. The reason was that the maintenance records for an aircraft, including the aircraft number, must be kept even after the remaining information about the aircraft has been deleted. This led to the conclusion that the aircraft number must be an attribute of entity type MAINTENANCE RECORD.

Using a class structure for the maintenance records, the business relationship type can be expressed by means of a relationship type, however, only for the maintenance records of existing aircraft:

• The maintenance records are subdivided into two subtypes: Maintenance records for aircraft owned by CAB (entity type ACTIVE RECORD) and maintenance records for aircraft no longer owned by CAB (entity type ARCHIVE RECORD).

Since a maintenance record must be either an active record or an archive record, the bundle cardinality must be 1..1. Both entity types are dependent entity types and the key of the supertype (Maintenance Number) is also the entity key of the subtypes.

An Alternate Maintenance Record Solution

1. .1

_for_

mAIRCRAFT

1. .1

S_is_

1 1D D

ARCHIVE RECORDMaintenance NumberKAircraft NumberRetention Date. . .

ACTIVE RECORDMaintenance NumberK

MAINTENANCE RECORDMaintenance NumberKDate of MaintenanceType of Maintenance. . .

. . . ???

Owner

1m

C

_belongs_to_




Uempty
• Since the remaining aircraft information no longer exists for archive maintenance records, subtype ARCHIVE RECORD must include the serial number of the aircraft to which it belonged (Aircraft Number). It may contain other attributes, such as the date until when the maintenance record must be retained (Retention Date), that only exist for archive maintenance records.
• Besides the entity key, subtype ACTIVE RECORD may contain additional attributes that only exist for active maintenance records. But are there any? This illustrates the possibility of entity types just containing the entity key. As we mentioned before, you should be suspicious of entity types not having nonkey attributes. So, you should be here and question if this is a good solution?

• By definition, active maintenance records belong to aircraft owned by Come Aboard. Therefore, their relationship to aircraft can be expressed by a relationship type between ACTIVE RECORD and AIRCRAFT.

• In general, other relationship types having MAINTENANCE RECORD as their source or target are not affected by the introduction of the class structure.

If an aircraft is no longer owned by CAB and its entity instance is removed, the appropriate instances for its maintenance records must be moved from subtype ACTIVE RECORD to subtype ARCHIVE RECORD. This is enforced by the entity-relationship model:

• The target cardinality of 1..1 for relationship type ACTIVE RECORD_for_AIRCRAFT requires, at all times, an aircraft for an active maintenance record. Consequently, if an aircraft is deleted, its active maintenance record instances must either be assigned to other aircraft or removed from ACTIVE RECORD. Since it would be incorrect to assign them to other aircraft, they must be removed from ACTIVE RECORD.

• On the other hand, bundle cardinality 1..1 requires that each instance of MAINTENANCE RECORD has an instance in either ARCHIVE RECORD or ACTIVE RECORD. If an aircraft is deleted, its maintenance records cannot have instances in ACTIVE RECORD as explained before. Thus, they must have instances in ARCHIVE RECORD.

As an instance is moved from ACTIVE RECORD to ARCHIVE RECORD, the serial number for the aircraft it belonged to (Aircraft Number) must be added along with any other attributes for archive maintenance records.



Student Notebook

Figure 4-41. ER Model for CAB Without Constraints CF182.0

Notes:

The above entity-relationship model for our sample airline company includes the changes discussed since we established the cardinalities for the relationship types. However, it does not include the alternate maintenance record solution on the previous visual since it does not really provide an improvement in our case.

Note the following changes:

• Entity type AIRCRAFT TYPE has been introduced. Entity type AIRCRAFT MODEL becomes dependent on it.

• Relationship type MAINTENANCE RECORD_belongs_to_MAINTENANCE RECORD is controlling for its source.

• Relationship type AIRPORT_nonstop_to_AIRPORT_in_ITINERARY has Leg Number as nondefining attribute in dependent entity type LEG.

• Relationship type AIRPORT_nonstop_to_AIRPORT_in_ITINERARY_for_FLIGHT has been replaced by relationship type FLIGHT_for_LEG. Note that entity type FLIGHT

PILOT ASSIGNMENT

_assigned_to_

m

m

_by_DC

1. .1

ER Model for CAB Without Constraints

EMPLOYEE

DC DC

_is_

1 1

S1

_for_

m

1m

1. .m

_in_

m

m

_can_fly_

1. .1

m

_from_

m

m

_scheduled_for_

m m

_can_land_at_

m

m

_trained_for_

m

1. .1

_for_

From To

m m

_nonstop_to_

D 1. .m

_for_

_belongs_to_

Owner

m 1

C MAINTENANCE RECORD

PILOTMECHANIC

ITINERARY FLIGHT

AIRPORT

AIRCRAFT

AIRCRAFT MODEL

AIRCRAFT TYPE

LEG_as_

1. .1

DC

mD

_for_m

1. .1

_for_




Uempty
becomes dependent on LEG: The values of a portion of its entity key must always match up with the appropriate values of the key of LEG.
The new relationship type seems to be more natural because its target is entity type LEG. However, we can do this only because:

- We chose to have a leg number even for one-leg itineraries (cardinality of LEG is 1..1 in AIRPORT_nonstop_to_AIRPORT_in_ITINERARY_as_LEG); otherwise, there would not be an entity instance of LEG for nonstop connections without leg number to which we could connect flights.

- The two relationship types have the same defining attributes, the key of entity type FLIGHT. (Also for the old relationship type, FLIGHT would have been a dependent entity type.)

• AIRCRAFT MODEL_for_AIRPORT_nonstop_to_AIRPORT_in_ITINERARY has been replaced by relationship type AIRCRAFT MODEL_for_LEG. Again, this can only be done because the cardinality of LEG is 1..1 in relationship type AIRPORT_nonstop_to_AIRPORT_in_ITINERARY_as_LEG and the defining attributes of the new relationship type are the same as for the old relationship type. (Note that the key of LEG is the same as the key of AIRPORT_nonstop_to_AIRPORT_in_ITINERARY.)

• The two relationship types PILOT_captain_for_FLIGHT and PILOT_captain_for_FLIGHT have been replaced by relationship type PILOT_assigned_to_FLIGHT as discussed.

• EMPLOYEE has been introduced as a supertype for PILOT and MECHANIC.



Student Notebook




Uempty
4.4 Constraints


Student Notebook

Figure 4-42. Constraints CF182.0

Notes:

Constraints are interdependencies between the objects of an entity-relationship model restricting the possible instances that entity types or relationship types can assume. The interdependent objects can be attributes, entity types, or relationship types. A single constraint can restrict the instances of multiple entity types and/or relationship types.

Logically, a constraint consists of three components: a set of constraining objects, a set of constrained objects, and a rule. The constraining objects can be attributes, entity types, or relationship types. Their values (attributes) or instances (entity types or relationship types) restrict the instances of the constrained objects. The constrained objects may be entity types or relationship types. The rule specifies how the values or instances of the constraining objects restrict the instances of the constrained objects.

There may be all kinds of constraints for the entity types and relationship types of an entity-relationship model. The simplest form of a constraint restricts the values of an attribute of an entity type and, thus, the instances that the entity type can assume. In this case, the constraining object is the attribute and the constrained object is the entity type.

Constraints

Constraint

An interdependency between objects of an entity-relationship model restricting the possible instances of entity or relationship types

The interdependent objects can be attributes, entity types, or relationship types

A single constraint can restrict the instances of multiple entity or relationship types

Primary source are business constraints, but there are others




Uempty
The rule describes how the values of the attribute, and, thus, the instances of the entity type, are constrained.
In principle, the value ranges (domains) of attributes could be considered as constraints. However, these are not the constraints you would like to visualize in an entity-relationship model since they would clutter it. You do need to document the domains of the attributes (more precisely, of the data elements on which the attributes are based), but you do this outside the entity-relationship model, namely, in the data inventory described in Unit 5 - Data and Process Inventories.

In the entity-relationship model, you should only document restrictions for attributes that go beyond the limitations imposed by the domains. The constraints that you really want to visualize in an entity-relationship model are those where an attribute, entity type, or relationship type constricts the instances of a different entity type or relationship type.

You should not formulate something as a constraint if it can reasonably be expressed by other modeling constructs. However, there will always be constraints that cannot be expressed by other modeling constructs even if additional modeling constructs were introduced. The variety of possible constraints is so immense that it is impossible to cover them all by additional modeling constructs.

The primary source for constraints are the business constraints for the considered application domain. Generally, a nontrivial application domain will have many constraints. So does the application domain for our sample airline company. For example, the business constraint that an aircraft cannot have more engines mounted than the aircraft type (!) allows gives rise to a constraint for entity type AIRCRAFT. We will study this and further examples on the subsequent visuals.

Besides the constraints resulting from the business constraints, an entity-relationship model may also contain other constraints that cannot directly be derived from business constraints. We will see such an example as well.



Student Notebook

Figure 4-43. Constraints in ER Model CF182.0

Notes:

As mentioned before, a constraint can limit the instances of a single object or of multiple objects. If a single entity type or relationship type is constrained, the constraint is positioned near the constrained object.

If multiple objects are constrained by the same constraint, the constrained objects are interconnected by a dotted line and the constraint is placed next to the connecting dotted line. To avoid cluttering and to maintain the clearness of the entity-relationship model, you may prefer not to connect the constrained objects, but rather repeat the constraint for every constrained object. If the constrained objects are far apart in the entity-relationship model, it may be difficult or even impractical to interconnect them.

To visualize the interdependency, a dotted arrow may be drawn from the constraining object (in case of an attribute from its entity type) to a constrained object and the constraint placed next to it.

In the entity-relationship model, the constraints themselves are documented as follows:

• The constraints are enclosed in braces.

Constraints in ER Model

A dotted arrow may be drawn from the constraining object to the constrained object with the constraint placed next to it

If a single object is constrained, constraint is placed near the constrained entity type or relationship type

If multiple objects are constrained by a constraint, constrained objects are connected by a dotted line and constraint is positioned near connecting line

Alternatively, the constraint is repeated for each constrained object

Multiple constraints for same object separated by semicolons:

{ id-1 : rule-1 ; id-2 : rule-2 ; id-3 : rule-3 }

Format of a single constraint for an object:{ identifier [ : rule ] }

Rule for constraint(optional if description outside ER model)

Unique identifier for description of constraint




Uempty
• Each constraint consists of a unique identifier which is optionally followed by a colon (:) and the rule describing the interdependency.
• The unique identifier for the constraint can be anything you like. Usually, it is a number. Its purpose is to tie together repetitions of the same constraint (if multiple objects are constrained by a single constraint as explained above) and to identify a detailed description of the constraint outside the diagram.

• The colon and the rule may only be omitted if an outside description of the constraint is provided.

Multiple constraints for an object may be placed within the same enclosing braces. The different constraints are separated by semicolons (;).

For the rule, you may use conditional expressions or formulas, if applicable, or natural language text. Natural language text may be easier to understand, but holds the danger of ambiguities. However, many of the rules can only be formulated using natural language.

It would be possible to define a formal notation using conditional expressions, mathematical symbols, set symbols, and functional operators covering most of the cases, but this formal notation would be complex and not necessarily enhance the clarity of the entity-relationship model. Most of the time, natural language may still be your best choice. Therefore, we will use natural language in most of the examples in this document.



Student Notebook

Figure 4-44. Constraints (Example 1) CF182.0

Notes:

The problem statement for Come Aboard in Unit 3 - Problem Statement states as a business constraint that an aircraft cannot have more engines mounted than the aircraft model allows. In the meantime, we have learned that the number of engines is rather a characteristic of the aircraft type and, therefore, an attribute of entity type AIRCRAFT TYPE (and not of entity type AIRCRAFT MODEL) as illustrated in the above entity-relationship model portion.

Attribute Number of Engines of entity type AIRCRAFT TYPE is the constraining object of the constraint. It restricts how many values the attribute Engine may have for instances of entity type AIRCRAFT. Thus, it constrains the instances of entity type AIRCRAFT, the constrained object.

The dotted arrow from AIRCRAFT TYPE to AIRCRAFT visualizes who constrains whom. The braces next to the arrow contain the identifier (1) and the rule for the constraint.

At the bottom of the visual, an outside (of the entity-relationship model) description of the constraint is given. It repeats the rule and provides an explanation. A more complete

_for_1. .1

m

_for_ D

1. .m

AIRCRAFT TYPE

ManufacturerNumber of Engines

Type CodeKCategory

. . .

AIRCRAFT MODEL

AIRCRAFT

. . .

Aircraft NumberKDate Acquired

value-1,value-2,value-3,value-4

Engine:

Constraints (Example 1)

Constraint No. 1

An aircraft cannot have more engines mounted than the aircraft type allows

Explanation:

Rule: Number of engines for aircraft Number of engines for aircraft type_<

{ 1 : Number of engines for aircraft Number of engines for aircraft type }

_<




Uempty
outside description should list the constraining objects (Number of Engines in entity type AIRCRAFT TYPE) and the constrained objects (AIRCRAFT).


Student Notebook


Notes:

This example demonstrates that a business constraint may result in multiple constraints for the entity-relationship model.

CAB has a business constraint specifying that the captain and copilot for a flight must be different. Assuming that the pilot assignment is modeled by the two relationship types PILOT_captain_for_FLIGHT and PILOT_copilot_for_FLIGHT (original solution), the translation of the business constraint into constraints for the entity-relationship model results in two constraints:

• The first constraint (2a) constrains the instances of relationship type PILOT_copilot_for_FLIGHT by requiring that a pilot that has already been assigned as captain to a flight cannot become the copilot of the flight as well. Thus, the instances of PILOT_copilot_for_FLIGHT are constrained by the instances of PILOT_captain_for_FLIGHT. If PILOT_captain_for_FLIGHT already contains an instance for a specified pilot and flight, an instance for them must not be added to PILOT_copilot_for_FLIGHT.


_captain_for_1 m

_copilot_for_1 m

FLIGHTPILOT

Constraint No. 2a

Rule: The captain for a flight cannot become the copilot at the same time

Explanation: A pilot that has been assigned as captain to a flight cannot become copilot for the flight at the same time. This means that relationship type PILOT_copilot_for_FLIGHT cannot receive a relationship instance already contained in PILOT_captain_for_FLIGHT.

{ 2a } { 2b }

Constraint No. 2b

Rule: The copilot for a flight cannot become the captain at the same time

Explanation: A pilot that has been assigned as copilot to a flight cannot become captain for the flight at the same time. This means that relationship type PILOT_captain_for_FLIGHT cannot receive a relationship instance already contained in PILOT_copilot_for_FLIGHT.




Uempty
Accordingly, the constraining object is relationship type PILOT_captain_for_FLIGHT and the constrained object is PILOT_copilot_for_FLIGHT. The rule is: The captain for a flight cannot become the copilot at the same time. A more formal notation for the rule could be:
(pilot, flight) c PILOT_captain_for_FLIGHT u (pilot, flight) v PILOT_copilot_for_FLIGHT

• Conversely, the second constraint (2b) constrains the instances of relationship type PILOT_captain_for_FLIGHT by requiring that a pilot that has already been assigned as copilot to a flight cannot become the captain of the flight. Thus, the instances of PILOT_captain_for_FLIGHT are constrained by the instances of PILOT_copilot_for_FLIGHT. If PILOT_copilot_for_FLIGHT already contains an instance for the specified pilot and flight, an instance for them cannot be added to PILOT_captain_for_FLIGHT.

Accordingly, the constraining object is relationship type PILOT_copilot_for_FLIGHT and the constrained object is PILOT_captain_for_FLIGHT. The rule is: The copilot for a flight cannot become the captain at the same time. In this case, a more formal notation for the rule could be:

(pilot, flight) c PILOT_copilot_for_FLIGHT u (pilot, flight) v PILOT_captain_for_FLIGHT



Student Notebook


Notes:

This visual illustrates the constraint required if the pilot assignment is modeled using a single relationship type with nondefining attributes (dependent entity type PILOT ASSIGNMENT). As we discussed before, in this case, the uniqueness of the entity key of dependent entity type PILOT ASSIGNMENT automatically takes care of the requirement that a pilot be assigned only once to a flight.

However, in order to ensure that not more than one captain and not more than one copilot are assigned to a flight, we need a constraint. The rule for the constraint is simply that the value of attribute Pilot Function must be unique for each flight. In other words, the quintuplet of attributes (Flight Number, From, To, Flight Locator, Pilot Function) must be unique.

In this case, the five attributes are the constraining objects and entity type PILOT ASSIGNMENT is the constrained object.

Note that the above constraint does not restrict the values of attribute Pilot Function to the two values CAPTAIN and COPILOT. This should be achieved through the domain definition for the appropriate data element.


FLIGHTFlight NumberKFromKToKFlight LocatorK. . .

PILOTEmployee NumberK. . .

_assigned _to_

mm _by_

1. .1D

PILOT ASSIGNMENTFlight NumberKFromKToKFlight LocatorKEmployee NumberKPilot Function

{ 2 : Pilot Function must be unique for a flight }

Constraint No. 2

Rule: Pilot Function must be unique for a flight

Explanation: For a flight, each function (CAPTAIN or COPILOT) must only be assigned once. This means the combination (Flight Number, From, To, Flight Locator, Pilot Function) must be unique.




Uempty
Furthermore, this constraint is not a direct derivative of a business constraint. It originates from the way we have modeled the pilot assignment. It does not enforce that the captain and copilot for a flight are different. (This is achieved in another way.) It only enforces that each function is only assigned once to a flight.


Student Notebook


Notes:

Come Aboard has a business constraint requiring that the pilots for a flight must have the license to fly the aircraft model for the leg for the flight, i.e., can fly the aircraft model. The above visual illustrates how this business constraint can be translated into a constraint for the entity-relationship model.

For the constraint, relationship types PILOT_can_fly_AIRCRAFT MODEL, FLIGHT_for_LEG, and AIRCRAFT MODEL_for_LEG are the constraining objects since their instances determine the pilots that can be assigned to a given flight: The aircraft model must be for the leg of the flight and the pilot must be able to fly the aircraft model. In other words, for a given flight, the aircraft model must be determined using relationship types FLIGHT_for_LEG and AIRCRAFT MODEL_for_LEG. Then, the resulting aircraft model must be used to determine the pilots that can fly the aircraft model.

The constrained object is relationship type PILOT_assigned_to_FLIGHT because only special pilots can be assigned to a flight, namely, those that can fly the aircraft model for the leg for the flight (rule).

_assigned_to_

mm

1. .1 m

_for_

m

_can_fly_

m

m

_for_

D

AIRCRAFT MODEL

PILOT FLIGHT

LEG


{ 3 }

Constraint No. 3

Rule: Pilot for flight must have license to fly aircraft model for leg

Explanation: A pilot assigned to a flight must be able, i.e., have the license, to fly the aircraft model for the leg for the flight.

AND




Uempty
Note that dependent entity type PILOT ASSIGNMENT for relationship type PILOT_assigned_to_FLIGHT is not shown on the visual.
You might get the idea that you could avoid the constraint by having a relationship type

(PILOT_can_fly_AIRCRAFT MODEL)_assigned_to_(AIRCRAFT MODEL_for_LEG),

interconnecting PILOT_can_fly_AIRCRAFT MODEL and AIRCRAFT MODEL_for_LEG, and basing the relationship type assigning pilots to flights on this relationship type rather than on PILOT. Not so since an instance of AIRCRAFT MODEL_for_LEG could be paired with any instance of PILOT_can_fly_AIRCRAFT MODEL, even with one that had a different aircraft model! There is nothing in the relationship type definition enforcing that only particular instances can be interconnected. Only a constraint could ensure that only instances with the same aircraft model were interconnected.



Student Notebook




Uempty
4.5 Splitting and Combining Entity-Relationship Models

© Copyright IBM Corp. 2000, 2002 Unit 4. Entity-Relationship Model 4-101

Student Notebook

Figure 4-48. Subdivision of ER Model into Pages CF182.0

Notes:

Most of the time, the entity-relationship model for an application domain will not fit onto a single page. Sure, you can use a bigger piece of paper, but this will only alleviate the problem and not solve it. The consequence is that the entity-relationship model must be split into pieces fitting on a single page.

Your first attempt should be to identify autonomous subareas of the application domain and to separate their entity-relationship models. If you cannot find such subareas or their entity-relationship models do not fit on a single page, try to identify different views with which you can look at the application domain and establish the entity-relationship models for them. A view comprises all objects of the entity-relationship model that a specific group of people needs to know or that concerns them.

For our sample airline company called Come Aboard, a sample view would be the Pilot View which comprises all entity types, relationship types, and constraints that pilots need to know about or apply to them. Another view would be the Maintenance View including all entity types, relationship types, and constraints concerning the aircraft maintenance. Both views are illustrated on the subsequent visuals.

Subdivision of ER Model into Pages

If subareas or views of application domain are still too large, try to find smaller logical subsets you can break out

If nothing of the above helps, break out units in such a way that:

Entity and relationship types have as few relationship types to objects on other pages as possible

Repeat entity or relationship types on other pages to illustrate relationship types

Determine subareas and/or different views of application domain

Establish entity-relationship model for subareas or views

Most of the time, an entity-relationship model will not fit onto one page

Must subdivide ER model into pieces that fit onto one page

The various submodels will overlap

Together, the submodels must cover the entire entity-relationship model (application domain)




Uempty
If the subareas or views are still too large, try to find smaller logical units that you can break out and that will fit onto one page.
If nothing of the above helps, you just have to break out any pieces of the entity-relationship model that will fit onto one page. Try to break them out in such a way that the entity types and relationship types on that page have as few relationship types to entity types or relationship types on other pages as possible. Of course, you need to illustrate the page-crossing relationship types on other pages. There, you need to repeat the entity types or relationship types of this page being their sources or targets.

Generally, the submodels on the various pages will overlap. Some parts of the entire entity-relationship model will occur on multiple pages. The submodels must not conflict with each other. Together, they must cover the entire application domain, i.e., cover all portions of the entire, undivided, entity-relationship model for the application domain.



Student Notebook

Figure 4-49. Pilot View of ER Model for CAB CF182.0

Notes:

This visual illustrates the Pilot View for Come Aboard. It comprises all entity types, relationship types, and constraints that pilots need to know or are concerned with.

Pilots want to know the flights they have been assigned to and their function on the flight. Therefore, the view needs to include, besides entity type PILOT, entity types FLIGHT and PILOT ASSIGNMENT and relationship types PILOT_assigned_to_FLIGHT and PILOT_assigned_to_FLIGHT_by_PILOT ASSIGNMENT.

Furthermore, pilots want to know to which leg of the itinerary a flight belongs and all information about the airports for the leg. Thus, the view must include entity types LEG, ITINERARY, and AIRPORT and relationship types FLIGHT_for_LEG, AIRPORT_nonstop_to_AIRPORT, AIRPORT_nonstop_to_AIRPORT_in_ITINERARY, and AIRPORT_nonstop_to_AIRPORT_in_ITINERARY_as LEG.

In addition, pilots need to know everything about the aircraft for the flight including its model and type. Consequently, the view must comprise entity types AIRCRAFT, AIRCRAFT MODEL, and AIRCRAFT TYPE and relationship types AIRCRAFT_for_FLIGHT, AIRCRAFT MODEL_for_AIRCRAFT, and AIRCRAFT TYPE_for_AIRCRAFT MODEL.

Pilot View of ER Model for CAB

{ 2 : Pilot Function must be unique for flight }

_for_

1m

DC

1. .1

_as_

m

1. .m

_in_

From To

m m

_nonstop_to_

DC

1. .1

_by_

m

m

_assigned_to_

m

1. .1

_for_

D 1. .m

_for_

m

_for_ D

PILOT ASSIGNMENT

ITINERARY

AIRPORT

LEG AIRCRAFTFLIGHT

PILOT AIRCRAFT TYPE

AIRCRAFT MODEL




Uempty
The illustrated constraint is the only one concerning the entity types and relationship types of this entity-relationship model view. Business constraints Pilots for Flight Must Have License for Aircraft Model for Leg and Only Aircraft Model With Start and Landing Rights for Legs concern relationship type AIRCRAFT MODEL_for_LEG which is not part of this view. Pilots need not necessarily know the corresponding constraints. The constraints rather deal with flight planning and pilot assignment, done by different groups of people, and would have to appear in the appropriate views.


Student Notebook

Figure 4-50. Maintenance View of ER Model for CAB CF182.0

Notes:

The Maintenance View comprises all entity types, relationship types, and constraints needed for the scheduling or performance of aircraft maintenance.

The scheduling concerns mechanics, aircraft, and aircraft models and must select mechanics from those trained for the aircraft model for the aircraft to be serviced. Thus, the maintenance view must include entity types MECHANIC, AIRCRAFT MODEL, and AIRCRAFT and relationship types AIRCRAFT MODEL_for_AIRCRAFT, MECHANIC_for_AIRCRAFT_MODEL, and MECHANIC_scheduled_for_AIRCRAFT.

During a maintenance, mechanics must write maintenance records or look at them. Therefore, the view must include entity type MAINTENANCE RECORD and relationship types MAINTENANCE RECORD_from_MECHANIC and MAINTENANCE RECORD_belongs_to_MAINTENANCE RECORD.

Furthermore, a mechanic needs to know information about the aircraft, its model, and its type. Consequently, the view must also include entity type AIRCRAFT TYPE and relationship type AIRCRAFT TYPE_for_AIRCRAFT MODEL.

Maintenance View of ER Model for CAB

m

1. .1

_for_

D1. .m

_for_

m

1 Owner

C

_belongs_to_1. .1m

_from_

mm

_scheduled_for_

_trained_for_

mm

AIRCRAFT

AIRCRAFT TYPE

AIRCRAFT MODEL

MECHANICMAINTENANCE RECORD

{ 4 : New maintenance record only for existing aircraft }

Constraint No. 5

Rule: Only trained mechanics for aircraft maintenance

Explanation: A mechanic can only service an aircraft if he/she has been trained for the appropriate aircraft model

{ 5 }AND




Uempty
There are two constraints applicable to the Maintenance View:
• The first constraint (4) requires that a new maintenance record is for an existing aircraft. This constraint enforces that the aircraft number for a new maintenance record belongs to an aircraft owned by CAB.

• The second constraint (5) ensures that only mechanics trained for the affiliated aircraft model are scheduled for the service of an aircraft.



Student Notebook

Figure 4-51. Building an Enterprise-Wide ER Model CF182.0

Notes:

These days, many companies want to establish an enterprise-wide entity-relationship model. For bigger companies, an enterprise-wide entity-relationship model comprises multiple application domains. The enterprise-wide entity-relationship model might be too complex to immediately build the entire model especially since you will rarely find a single domain expert that fully understands all application domains involved.

As a consequence, it might be better to start with separate models for the various application domains and then to consolidate them. It is necessary to consolidate the results of every step of the design process for all application domains before continuing on to the next step for any of the application domains. If you do not do this, the results most likely will not fit together.

For the entity-relationship models, this means two things:

1. You should consolidate the problem statements of the various application domains before developing the respective entity-relationship models.

Building an Enterprise-Wide ER Model

An enterprise may comprise many application domains

Must build separate models for the application domains

May be too complex to start building a single ER model

Consolidation needed for every step of design process before continuing!!!

Consolidation of problem statements before starting with ER models

Consolidation of ER models before continuing




Uempty
2. You should consolidate the entity-relationship models before proceeding to the next step of the design process for any of the application domains.


Student Notebook

Figure 4-52. Problems During Consolidation of ER Models CF182.0

Notes:

During the consolidation of the entity-relationship models for the various application domains, you may experience problems concerning the entity types, the relationship types, or the constraints of the different models.

The different models may contain entity types that have the same names, but a different meaning. Thus, the names must be changed to achieve uniqueness. Conversely, entity types with different names may correspond to the same business object types and, therefore, should be named the same.

Furthermore, because of the different perspectives of the individual application domains, an entity type for one application domain may just be a set of attributes of another entity type in another application domain. In this case, the set of attributes must become an entity type in the other application domain as well and the necessary relationship types using this new entity type as source or target must be established.

A further problem that may surface is that the entity keys for the same entity types in different entity-relationship models may be different. This problem is easy to resolve. Since

Problems During Consolidation of ER Models

Problems for Entity Types

Entity types with different names may have the same meaning

Entity types with same name may have a different meaning

Entity types in one ER model may be attributes in other models

Entity keys may be different

Problems for Relationship Types

Relationship types with same name may have a different meaning

Relationship types with different names may have the same meaning

Cardinalities may be different

Properties (controlling, dependent, supertype) may be different

Relationship key may be different

Problems for Constraints

Constraints may be missing

Constraints may be conflicting




Uempty
the entity types are the same, they have the same attributes so that the same entity key can be chosen for all entity-relationship models.
As for entity types, the different entity-relationship models may contain relationship types with the same name, but with a different meaning. To remove the problem, the names must be changed to achieve uniqueness. This applies to the names for both directions of the relationship types. Conversely, differently named relationship types may have the same meaning and, therefore, should be named the same.

The cardinalities of the same relationship types may be different in different entity-relationship models. In this case, the true cardinalities must be determined and the erroneous entity-relationship models changed accordingly.

Some of the properties for relationship types may different. If there is a difference for the controlling property, it must be determined if the deletion of the appropriate source or target instances should indeed take place on an enterprise-wide scale. If so, the controlling property must be added where it was omitted. If not, it must be dropped where specified.

If a relationship type is an owning relationship type in one entity-relationship model, but not in another, it must be checked if the dependent entity type really fulfills the dependency requirements and the proper corrections must be made in one of the models. Furthermore, a class structure may have been recognized in one entity-relationship model, but not the other. In this case, it must be introduced in the entity-relationship model where it is missing, the supertype must appropriately be identified, and relationship types using the former entity types as source or target must be verified.

In case of 1:1 relationship types, there are two choices for the relationship key. Thus, different models may have chosen a different relationship key. Just choose the same relationship key for all models concerned.

In some models, constraints may be missing that have been identified in other models. It must be verified if the constraints have an enterprise-wide scope and the erroneous model must be changed accordingly.

Furthermore, the different models may contain conflicting constraints. The conflicts must be resolved by the domain experts and the models changed accordingly.

There may be other problems during the consolidation, but this list should already give you a pretty good idea of what to look for.



Student Notebook

Checkpoint


1. Name the three major components of entity-relationship models.

_____________________________________________________

_____________________________________________________

_____________________________________________________

2. The instances of an entity type may all have a different meaning. (T/F)

3. Explain the difference between an entity type and an entity instance.

_____________________________________________________

_____________________________________________________

_____________________________________________________

4. What is the purpose of entity keys? What is the minimum principle for entity keys?

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

5. The business object types for the application domain are the primary source for the entity types of an entity-relationship model. (T/F)

6. Describe what a relationship type is.

_____________________________________________________

_____________________________________________________

_____________________________________________________




Uempty
7. The source of the primary direction of a relationship type is the target of the inverse direction and the source of the inverse direction is the target of the primary direction. (T/F)
8. A relationship type can again be the source or target of a relationship type. (T/F)

9. A business relationship type can always be translated into a relationship type of the entity-relationship model for the application domain. (T/F)

10.Match the following catchwords with the corresponding cardinalities:

11. Assume that you have two entity types SEAT and PASSENGER and a relationship type PASSENGER_has_SEAT expressing which seats have been assigned to passengers. A passenger may have zero, one, or multiple seats assigned to him/her. A seat can only be assigned to a single passenger.

Specify the cardinalities for the source and target of relationship type PASSENGER_has_SEAT:

a. Cardinality for source: _______

b. Cardinality for target: _______

12.Describe the terms 1:1 relationship type, 1:m relationship type, and m:m relationship type.

_____________________________________________________

_____________________________________________________

_____________________________________________________

a. at most one ____ 0..mb. any number ____ 1c. one and only one ____ 0..1d. one or more ____ m

____ 1..m ____ 1..1



Student Notebook

13.Assume that you have the following entity-relationship model:

a. For relationship type r1, how many instances of entity type B can at most be connected to an instance of entity type A?

__________________________________________________

b. For relationship type r1, how many instances of entity type B must at least be connected to an instance of entity type A?

__________________________________________________

c. For relationship type r1, how many instances of entity type A can at most be connected to an instance of entity type B?

__________________________________________________

d. For relationship type r1, how many instances of entity type A must at least be connected to an instance of entity type B?

__________________________________________________

e. For relationship type r2, how many instances of entity type A can at most be connected to an instance of entity type C?

__________________________________________________

f. For relationship type r2, how many instances of entity type A must at least be connected to an instance of entity type C?

__________________________________________________

A B

C

m

r2

m1

r1




Uempty
14.Based on the entity-relationship model for the previous checkpoint question, assume that entity types A, B, and C have the following instances:
Are all of the following relationship instances for relationship type r1 possible? If not, explain why they are not all possible.

_____________________________________________________

_____________________________________________________

Would you expect any relationship instances for relationship type r2?

_____________________________________________________

_____________________________________________________

Entity Type Entity Instances

A A1, A2, A3

B B1, B2, B3, B4

C C1

Relationship Type Relationship Instances

r1 (A1, B3), (A1, B4), (A3, B1), (A3, B3)



Student Notebook


List the defining attributes and the relationship keys for relationship types r1 and r2. Use the term key of ... to describe them.

Defining attributes for r1: _______________________________

Relationship key for r1: _______________________________

Defining attributes for r2: _______________________________

Relationship key for r2: _______________________________

16.The entity key of a dependent entity type must be equal to the entity key of another entity type or the relationship key of a relationship type. (T/F)

17.Name the criteria that an entity type must fulfill to be a dependent entity type.

_____________________________________________________

_____________________________________________________

_____________________________________________________

A B

C

1..m

r2

1m

r1

m




Uempty
Furthermore, assume that A and B have the following entity instances (just the keys are shown):

Can the owning relationship type r1 have the following relationship instances?

(A1, A1.B1), (A1, A1.B2), (A1, A2.B1), (A2, A2.B1), (A3, A3.B1)

_____________________________________________________

_____________________________________________________

_____________________________________________________

19.How can you represent the nondefining attributes for a relationship type in an entity-relationship model?

_____________________________________________________

_____________________________________________________

_____________________________________________________

20. If you specify the controlling property for the target of a relationship type, a relationship instance is to be deleted when its target instance is deleted. (T/F)

21. If you specify the controlling property for the target of a relationship type, the target instance belonging to a relationship instance is to be deleted when the relationship instance is deleted. (T/F)

Entity Type Entity Instances

A A1, A2, A3

B A1.B1, A1.B2, A2.B1, A3.B1

A Bm

r1 D



Student Notebook


Furthermore, assume that the entity types and relationship types have the following instances:

Which instances will the various entity types and relationship types have after entity instance C2 of entity type C has been deleted?

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

Object Instances

A A1, A2, A3

B B1, B2

C C1, C2, C3

D D1, D2, D3

r1 (C1, A2), (C2, A3)

r2 (A1, B1), (A2, B1), (A3, B2)

r3 (C1, D1), (C1, D2), (C2, D3)

r4 ((A1, B1), (C1, D2)), ((A1, B1), (C2, D3))

A B

m

r4

mm

r2

m

C Dmm

m

m

r1

r3

C

C

C




Uempty
23.The purpose of supertypes and subtypes is to categorize the instances of entity types. (T/F)
24.Which are the components of a class structure?

_____________________________________________________

_____________________________________________________

_____________________________________________________

25.What is the is-bundle?

_____________________________________________________

_____________________________________________________

_____________________________________________________

26.Match the following partial sentences with the proper bundle cardinalities:

27.Which mechanism can you use to restrict the instances of entity types or relationship types?

_____________________________________________________

_____________________________________________________

28.Name the three components of constraints.

_____________________________________________________

_____________________________________________________

_____________________________________________________

a. The subtype set is exclusive, but not covering if the bundle cardinality is ...

____ 1..1

b. The subtype set is covering, but not exclusive if the bundle cardinality is ...

____ 1..m

c. The subtype set is exclusive and covering if the bundle cardinality is ...

____ 0..1

d. The subtype set is not covering and not exclusive if the bundle cardinality is ...

____ 0..m



Student Notebook

29.What is the format of a constraint in the entity-relationship model?

_____________________________________________________

_____________________________________________________

_____________________________________________________




Uempty

Figure 4-53. Unit Summary (1 of 3) CF182.0

Notes:

Unit Summary (1 of 3)

The three major components of entity-relationship models are:

Entity types, relationship types, constraints

Entity types are conceptual units representing classes of objects with the same meaning and characteristics

Have attributes (conceptual pieces of information)Instances uniquely identified by entity keyPrimary source: business object types for application domain

Cardinalities for relationship types determine:

How many interrelationships a source instance can and must have with a target instanceHow many interrelationships a target instance can and must have with a source instance

Relationship types are classes of interrelationships between the instances of entity types and/or relationship types

All interrelationships have the same meaning and characteristicsAll interrelationships interconnect two instances A relationship type has a primary and an inverse direction Primary source: business relationship types for application domain



Student Notebook


Notes:


Dependent entity type is an entity type connected to a parent entity type or a relationship type via an owning relationship type

Each dependent instance connected to one and only one parent instance Key portion of dependent entity type = key of parentOnly instances interconnected with matching key portion/key values

Defining attributes completely describe instances of relationship type

Relationship key uniquely identifies instances of relationship type

Relationship type may have nondefining attributes

Modeled by means of dependent entity types

Controlling property for relationship type specifies if source or target instance to be deleted when relationship instance is deleted

Cascading effect

Class structures allow the categorization of entity instances

Supertype = generalization of subtypesSubtypes = specializations of supertypeSubtype set may be exclusive and/or covering




Uempty


Notes:


Is-bundle = set of _is_ relationship types connecting the supertype to its subtypes

Represented as a fork with handle starting at supertypeBundle cardinality specifies to how many subtype instances a supertype instance can be and must be connected

Large entity-relationship models must be split into pages

ER models for subareas, views, or logical subsets of application domainTogether submodels must cover entire entity-relationship model

Constraints are interdependencies between the objects of an ER model restricting the possible instances of entity types or relationship types

Consist of constraining objects, constrained objects, and a rulePrimary source: business constraints for application domain

When building an enterprise-wide entity-relationship model:

Start with entity-relationship models for separate application domainsConsolidate results for every step of design process before moving on to next step



Student Notebook




Uempty
Unit 5. Data and Process Inventories

This unit describes the purpose and content of data and process inventories. Furthermore, it describes methods for developing them and gives examples for their content.


After completing this unit you should be able to:

• Explain the purpose of data and process inventories.

• Explain the significance of data inventories for database design.

• Understand who has the responsibility the creation of data and process inventories.

• Describe the content data and process inventories should have for database design.

• Summarize some methods for establishing data and process inventories.


Accountability:



© Copyright IBM Corp. 20 00, 2002 Unit 5. Data and Process Inventories 5-1

Student Notebook


Notes:

Up to now, from the problem statement, the entity-relationship model for the application domain has been developed. To develop the corresponding database, you must determine the data that should be contained in the database before you can proceed. This means that you must establish a list of all data for the application domain, that is, the data inventory.

In this unit, we will talk about the data inventory and the process inventory which is interrelated with it. We will describe their purposes and explain the significance of the data inventory for database design. You will find out whose responsibility it is to establish the data and process inventories for the application domain.

In addition, you will learn what the content of data and process inventories should be from the perspective of database design. The process inventory is primarily intended for application programmers, but is important for database design: The descriptions of its business processes reveal the data that should be contained in the database for the application domain.

Unit Objectives

Understand who has the responsibility for

the creation of data and process inventories


Explain the purpose of data and process

inventories

Explain the significance of data inventories

for database design

Summarize some methods for establishing

data and process inventories

Describe the content data and process

inventories should have for database design




Uempty
5.1 Data Inventory


Student Notebook

Figure 5-2. Data and Process Inventories in Design Process CF182.0

Notes:

The preceding steps of the design process dealt with the problem statement and the entity-relationship model for the application domain. To establish the database for the application domain from the entity-relationship model, you need to know the various pieces of data to be stored in the database. The data for the application domain are described in the data inventory.

Data inventory and process inventory are developed in parallel during the conceptual view of the design process. They are established after the entity-relationship model because the entity-relationship model can be used in their development and is verified as part of their development.

In principle, the data inventory can be developed without the process inventory. However, the best method for developing the data inventory is to couple its development and the development of the process inventory. The process inventory contains a description of all business processes for the application domain. The description for a business process lists the data used by the business process. Hence, the process inventory reveals the data

Data and Process Inventories in Design Process

ConceptualView


Data Inventory

Tuple Types

Tables

Indexes

Process Inventory


Integrity Rules

Problem Statement





Uempty
elements that should be contained in the database for the application domain and, thus, should be described in the data inventory.
By coupling the data and process inventories, you can ensure that all data needed by documented business processes of the application domain are contained in the data inventory and only these data. Consequently, the database will contain precisely the required data. Furthermore, you can ensure that the data inventory is updated as new business processes are planned and recorded in the process inventory.



Student Notebook

Figure 5-3. Data Inventory - Purpose and Responsibilities CF182.0

Notes:

The data inventory contains a detailed description of all data for the application domain: It describes all abstract data types, data elements, and data groups for the application domain.

Data elements are indivisible pieces of data. They cannot be divided into smaller pieces meaningful for the application domain.

In contrast, data groups are sets of logically related (for the application domain) data elements and/or data groups. This is a recursive definition and implies that data groups can contain data groups. The data elements or data groups of a data group are referred to as components of the (owning) data group.

A data group can be viewed as a tree structure of one or more levels whose lowest level nodes (terminal nodes) are data elements. An example of a data group may be Name of Person, i.e., the name of a person, consisting of data elements Last Name, First Name, and Middle Initial.

Data Inventory - Purpose and Responsibilities

Independent of entity types

Multiple entity types may use same data for different purposes

Detailed description of all data for application domain

Data elements and data groups

Data element = indivisible piece of data

Data group = group of logically related data elements and/or data groups

Jointly created by:

Knows what is needed for database designKnows entity types for data elements/data groups

Database designer

Knows application domain

Application domain expert

Input for database designer




Uempty
The correct identification of data groups is important for the later steps of the design process since they identify items that logically belong together. In particular, they enable the recognition of repetitive groups and of groups of data that can be separated out (vertical splitting).
Since the data inventory is part of the conceptual view, it should be purely application-domain oriented. It should not contain data elements or data groups caused by implementation and not having a direct meaning for the application domain.

The entity-relationship model is input for the development of the data inventory. It helps identify data elements and data groups. Data elements and data groups can be viewed as abstractions or generalizations of elementary attributes and composite attributes, respectively. They define elementary or composite data for the application domain independent of their usage by entity types. Therefore, the definition of the data elements and data groups should be independent of the entity types of the entity-relationship model.

In this context, the question arises if you should have two different data elements or data groups for data with the same fundamental meaning, but a (slightly) different usage? For example, for our sample airline company, we want to store the planned departure time and the actual departure time for a flight. Should you have different data elements Planned Departure Time and Actual Departure Time or just a single data element Departure Time? Planned Departure Time and Actual Departure Time are certainly different attributes for entity type FLIGHT.

The answer is that both solutions are feasible. If you choose a single data element, data element Departure Time is used in two different roles (purposes) by entity type FLIGHT: It is used as planned departure time (attribute name Planned Departure Time) and as actual departure time (attribute name Actual Departure Time). If you choose two different data elements, they are used by entity type FLIGHT in their fundamental meanings. In this case, the attribute names can be the names of the data elements.

It is a matter of taste and judgement where you make the assignment of roles: on the data element level, the data group level, or the entity type level. If you make the differentiation on the data element level, you must make the description of the data elements more restrictive, but need not deal with roles. If you differentiate on the data group or entity type level, you can keep the description of the data elements more general, but have to deal with roles. Although the roles must be described as well, the definition work may be somewhat less in the latter case.

However, you should make a sensible trade-off. In the extreme, you could decide on having a single data element for all data with the same data type and use roles for all usages. For example, you could have a single data element Time representing a time and define all kinds of roles for it: as planned time of departure, as actual time of departure, as planned time of arrival, as actual time of arrival, and so on. By now, you should know that this is certainly not the way to go. Having data elements Departure Time and Arrival Time would probably be adequate in this case.



Student Notebook

As described above, data elements or data groups may be used as components by other data groups. They may also be used as attributes by entity types or tuple types (as we will see later on).

Data elements and data groups as such do not have cardinalities. However, when used as component or attribute, a cardinality is associated with a data element or data group. The cardinality specifies how many values the data element/data group must assume at least and at most and will assume on average for this usage . Note that two sets, having the same data elements and data groups, are considered different data groups if the cardinalities of the components are different.

A data element or data group may be used by many data groups and entity types. A data element or data group may even be used multiple times (for different purposes) by the same data group or entity type.

When a data group is used by an entity type, it becomes a composite attribute of the entity type. This means that the entity type contains elementary attributes for all data elements of the tree structure for the data group.

The data inventory must be created by someone with detailed knowledge of the application domain. Thus, the application domain expert must be involved in the creation of the data inventory. However, he/she needs the help of the database designer. The database designer knows best what is needed for database design. He/She knows the entity types of the entity-relationship model that may contain the data elements or data groups and has a better understanding of data types. The data inventory identifies the entity types using the data elements or data groups and describes the abstract data types the data elements are based upon.

According to the above, the application domain expert and the database designer must jointly develop the data inventory.

The data inventory is input for the database designer who needs it during the later steps of the design process. The data inventory also helps application programmers when designing the application programs or queries for the business processes of the application domain.




Uempty

Figure 5-4. Contents of Data Inventory (1 of 3) CF182.0

Notes:

The first component of the data inventory is the description of the abstract data types for the application domain.

Data types describe the values data of that data type can assume and the operations that can be performed with the data. Thus, by associating a data element with a data type, you define the fundamental values and the operations for the data element.

In support of the SQL standard, all relational database management systems provide a set of standard data types such as INTEGER, DECIMAL, CHARACTER, DATE, or TIME. These standard data types are general-purpose data types covering many situations, but are imprecise.

Abstract data types go beyond standard data types. You tailor them for your application domain so they reflect the values the associated data elements can assume and the operations that can be performed with the data elements.

As an example, take the employee numbers for pilots and mechanics of our sample airline company called Come Aboard. They consist of digits, and you would be attempted to

Contents of Data Inventory (1 of 3)

Abstract Data Types

For each data type:

Signature: A unique name for the data type followed, in parentheses and separated by commas, by the parameters for the data type

Values: A description of the values that data belonging to the data type can assume

For a finite number of values, a list of the possible values

Can be a textual description of the values

Can be values of another data type or a subset thereof

Can be defined by a formula

Operations: A description of the operations that can be performed for data of the data type

Including operator name, operands, and results

Including Equal Comparison (=) determining when data of the data type are considered equal



Student Notebook

assign them to the standard data type INTEGER. However, they are not really integers: You should not perform the usual integer operations (such as integer addition and subtraction) with them. Furthermore, they cannot be negative and leading zeros have a meaning and should not be suppressed. You might suggest to define them as CHARACTER data. However, this would result in a different problem: Employee numbers could contain letters which is not correct either. The solution is an abstract data type reflecting that employee numbers consist of digits and cannot be added or subtracted.

By implementing abstract data types, you can ensure that the values of the data of your database are always "syntactically" correct. You can also prevent undesired or illegal operations for them.

The same data type can be used by many data elements. Sometimes, different data elements have only slightly different requirements on their data types raising the question: can the same data type be used? For example, for two character-string type data elements, only the allowable maximum length may be different.

To enable the common usage for slight differences, data types can be parameterized. For each data element, you can specify different parameter values. For the character-string example, the parameters could be the minimum length and the maximum length for the respective character-strings.

Data types can only be associated with data elements. They cannot be associated with data groups since these just have a grouping function. Data groups may be composed of many data elements all having different data types.

The data inventory should contain a description of all abstract data types for your application domain. The descriptions of the data elements will refer to the data types. Preferably, you should only use your own abstract data types. However, realistically, most data inventories will also use the standard data types of the SQL standard. Include a list of the standard data types used by your data elements.

For each abstract data type, you should describe the following:

• The Signature of the Ab stract Data Type

The signature consists of a unique name for the data type followed, in parentheses and separated by commas, by the parameters for the data type.

• The Values for the A bstract Data Type

The description of the values depends on the abstract data type:

- If the abstract data type can only assume a finite number of values, the values can be listed.

- The values of the abstract data type may be the values of another abstract data type, of a standard data type, or of a subset thereof. In this case, specify the appropriate subset.

- The values of the abstract data type may be defined by a formula. This is especially true for integer data which must satisfy a specific formula.




Uempty
- Frequently, the values can only be described by text.
• The Operations for the Abstract Data Type

For each operation, specify the name of the operator, the operands, and the result of the operation. You can provide them in a manner similar to that for the signature.

You always want to include the Equal Comparison (comparison for equality). This operation determines whether or not, and thus when, two values of the data type are considered equal. As we will see in the subsequent examples, multiple allowable values of an abstract data type may be considered equal.

The abstract data types for your application domain must be implemented to become effective. It depends on the features of the database management system if they can be implemented as part of the database or must be implemented via application programs. Sometimes, you want both: to intercept invalid input as soon as possible; to avoid update operations directly using the data manipulation language of the database management system corrupting the data.


© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-11

Student Notebook

Figure 5-5. Sample Data Types (1 of 4) CF182.0

Notes:

The abstract data type described on the visual deals with text data, i.e., descriptive text such as remarks added to the maintenance records for Come Aboard. The values consist of arbitrary strings of printable characters. The strings must have at least as many characters as specified by parameter minimum-length of the signature and not more than specified by parameter maximum-length.

Both parameters of the signature are optional. If minimum-length is not specified, a default minimum length of 1 is assumed. If maximum-length is not specified, the length of the string is not limited.

Two operations are allowed for text data:

• Operation

NORM(text-data-1) t text-data-2

normalizes text-data-1. This means it removes leading and trailing blanks and replaces intermediate groups of blanks by a single blank each. The result is again a text data string, text-data-2.

Sample Data Types (1 of 4)

Removes all leading and trailing blanks from text-data-1

Reduces intermediate groups of blanks for text-data-1 to a single blank each

Normalize Text Data

NORM(text-data-1) text-data-2

TEXTDATA( [ minimum-length ] [ , maximum-length ] )

Any string of printable characters. Minimum-length and maximum-length specify how many characters the string has at least (default: 1) and at most (default: unlimited).

Text Data

Signature:

Values:

Operations:

EQUAL(text-data-1, text-data-2) { TRUE | FALSE }

Normalizes text-data-1 and text-data-2 and compares them character by character

Result is TRUE if all characters are equal; FALSE otherwise

Equal Comparison




Uempty
For example, " This is text ..." becomes "This is text ..." when normalized. (Note that the surrounding double-quotes do not belong to the text. They are used here for clarity purposes and delimit the text.)
• The second operation is the equal comparison for text data:

EQUAL (text-data-1, text-data-2) t {TRUE | FALSE}

Text-data-1 and text-data-2 are input for the operation. The operation normalizes both strings and compares the normalized text data character by character. If the normalized strings are character-wise identical, they are considered equal and the result is TRUE. If they are character-wise different, the result is FALSE. In particular, the strings are not considered equal if the corresponding normalized strings have a different length.

Accordingly, the strings " Equal strings" and "Equal strings " are considered equal.

In the database, you want text data to be stored in the normalized form. Normalization of text data is important when searching for a specific string based on user input. User input may not always be normalized.



Student Notebook


Notes:

The abstract data type described on the visual deals with name data such as the last name or first name of a person. Names do not allow arbitrary characters. They allow letters, blanks, and single dashes (-) or periods (.). Thus, the values for abstract data type Name Data consist of strings of these characters of the specified minimum and maximum lengths.

For name data, you have the same operations as for text data. However, the normalization of name data uppercases all letters. For example, "Miller", "MiLLer", and "miller" all become "MILLER" when normalized. (Note that the surrounding double-quotes do not belong to the text.)

As for text data, the equal comparison compares the normalized strings. Thus, "Miller", "MiLLer", and "miller" are all considered equal.

In the database, you want name data to be stored in the normalized form. Normalization of name data is important when searching for a specific name based on user input. User input may not always be normalized.


Name Data

NAMEDATA( [ minimum-length ] [ , maximum-length ] )

Any string of letters, blanks, and single dashes (-) or periods (.). Minimum-length and maximum-length specify how many characters the string has at least (default: 1) and at most (default: unlimited).

Signature:

Values:

Operations:

EQUAL(name-data-1, name-data-2) { TRUE | FALSE }

Normalizes name-data-1 and name-data-2 and compares them character by character


Equal Comparison

Normalize Name Data

NORM(name-data-1) name-data-2

Removes all leading and trailing blanks from name-data-1

Reduces intermediate groups of blanks for name-data-1 to a single blank each

Uppercases all letters of name-data-1




Uempty


Notes:

The abstract data type on this visual deals with alphanumeric strings. They consist of letters and digits. An example are the aircraft serial numbers for Come Aboard. Again the abstract data type is parameterized: The minimum and maximum lengths for the alphanumeric string can be specified.

For this abstract data type, lowercase letters and uppercase letters are considered different since a normalization operation has not been provided. Leading or trailing blanks are not allowed. They are considered as improper input.


ALPHANUMERIC( [ minimum-length ] [ , maximum-length ] )

Any string of alphanumeric characters. Minimimum-length and maximum-length specify how many characters the string must have at least (default: 1) and at most (default: unlimited).

Alphanumeric String

Signature:

Values:

Operations:

EQUAL(alphanumeric-1, alphanumeric-2) { TRUE | FALSE }

Compares alphanumeric-1 and alphanumeric-2 character by character


Equal Comparison



Student Notebook


Notes:

The values of the abstract data type on this visual are the international codes for the airports serviced by Come Aboard. Thus, a finite set of values that can be listed. As indicates by the ellipsis (three dots) at the end, only a few values are shown on the visual.

The abstract data type is not parameterized. It only supports the equal comparison.


AIRPORT CODE

One of the following acronyms:ATL, CDG, DFW, FCO, FRA, JFK, LAS, LAX, MAD, ORD, SAN, SFO, SJC, STR, ZRH, ...

Airport Code

Signature:

Values:

Operations:

EQUAL(airport-code-1, airport-code-2) { TRUE | FALSE }

Compares airport-code-1 and airport-code-2 character by character


Equal Comparison




Uempty


Notes:

The second component of the data inventory is the inventory of the data elements and data groups for the application domain. For the application domain, data elements are indivisible pieces of information. Data groups are groups of logically related data elements or data groups as explained before.

For each data element or data group, you should provide the following basic items:

Name

The unique name for the data element or data group. Each data element and data group receives a unique name. The name should clearly express the meaning of the data element or data group for the application domain. It may consist of multiple words. We will start each word with a capital letter except for connecting words such as of or for.

The names are used by the business processes, described in the process inventory, to refer to the data elements and data groups they need.


Data Elements and Data Groups

For each data element/data group:

As precise as possible to avoid synonyms and homonyms

Description: A detailed textual description of the meaning of the data element or data group for the application domain

Name: A unique identifier for the data element or data group

One or more words

Each word starting with a capital letter except for connecting words such as of or forAs natural as possible for application domain

Type: Data element or data group

Data Type: If data element, data type for data element

Minimum Length

Maximum Length

Average Length

Decimal Places

Number of Digits

Lengths: If data element:

Domain: If data element, value constraints for data element over and above those imposed by data type



Student Notebook

Type

The type of the object described, i.e., if the object is a data element or a data group. Thus, the proper values are data element and data group, respectively.

Description

A detailed textual description of the meaning of the data element or data group for the application domain. The description should be as precise as possible to avoid synonyms and homonyms.

Synonyms are data elements or data groups having a different name, but meaning the same object. Synonymous data elements or data groups can lead to the same information being stored multiple times, i.e., to the redundant storage of information.

Due to the equivocalness and ambiguity of their names and descriptions, data elements or data groups that are homonyms can be interpreted to mean different things. Their names and descriptions should be made unambiguous. If necessary, they must be split into multiple data elements or data groups.

Data Type

For data elements, the data type (standard data type or abstract data type) of the data element including the applicable values for parameters of the data type.

For data groups, this item is not applicable. It should either be marked as not applicable or be omitted.

Lengths

For data elements, the lengths applicable for them. For string-type data elements, important lengths are: their minimum length, their maximum length, and their average length. The average length is important for estimates made by the database administrator when allocating space for tables.

For numbers, important values are: their number of digits and, if applicable, the number of decimal places.

If the data type for the data element is parameterized, some of these values may already have been specified as parameters for the data type.

For integers, you may prefer to specify a domain, that is, the range of values that can be assumed. The number of digits can then be derived from the specified range.


Domain

For data elements, value constraints over and above those implied by the data type for the data element.

If you use abstract data types extensively and correctly, you probably will not need additional values constraints.




Uempty
If you use standard data types, such as INTEGER, you might want additional value constraints such as the minimum and maximum values the data element can assume.
If multiple data elements basically have the same data type and only differ marginally in their allowable values, you might decide to use a single abstract data type covering the values of all data elements concerned and define value constraints for the various data elements.




Student Notebook


Notes:

In addition to the basic items on the previous visual, you should provide the following items for a data element or data group:

Data Groups

The data element or data group being described may be a component of other data groups. It may belong to a data group more than once, however, in different roles. For each data group the data element or data group belongs to and each role played, provide the following:

• The name of the owning data group.

• The role played in the other data group. Provide a textual description of the role and the name the data element or data group assumes in that role. Description and name need only be provided if they are different from the fundamental purpose and name of the data element or data group.

• The cardinality of the data element or data group for the role in the data group. Provide minimum, maximum, and average cardinality. This means, specify how many values the


Completeness Check for Entity-Relationship Model

Data Groups:

Cardinality may be different for each data group and roleMay play different roles in same data group

For each data group the data element or data group belongs to and for each role it plays in the data group:

Data Group Role Cardinality

Name Description Name Min Max Avg

Entity Types: For each entity type the data element or data group belongs to and for each role it plays for the entity type:

Entity Type Role Cardinality

Name Description Name Min Max Avg

Entity-Relationship Model




Uempty
data element or data group must assume at least and at most and will assume on average for this usage. If the maximum cardinality is not limited, use an asterisk (*).
Entity Types

Most of the time, data elements or data groups are immediately used by entity types and not indirectly through data groups. If the data element or data group is a direct attribute of an entity type, provide the following:

• The name of the entity type. The entity-relationship model helps you determine the appropriate entity types.

• The role played for the entity type. Provide a textual description of the role and the name the data element or data group assumes in that role. Description and name need only be provided if they are different from the fundamental purpose and name for the data element or data group.

• The cardinality of the data element or data group for the role it plays in the entity type. Provide minimum, maximum, and average cardinality. This means, specify how many values the data element or data group must assume at least and at most and will assume on average for this usage. If the maximum cardinality is not limited, use an asterisk (*).

Do not provide an entry for this item if the data element or data group is not immediately used by entity types.

By determining the entity types for data elements and data groups, you verify the completeness of the entity-relationship model. If you find a data element or data group that cannot be associated with another data group or an entity type, the entity-relationship model is incomplete and must be corrected.

Relationship types are not of interest in this context because their defining attributes are derivatives of the keys of their source and target.



Student Notebook

Figure 5-11. Sample Data Elements and Groups (1 of 7) CF182.0

Notes:

The above visual illustrates a data element for our sample airline company. The data element is a component of a data group. It is not used as direct attribute of an entity type.

The data element is called Last Name and represents the last name of a person (for example, a pilot or mechanic). The data type for the data element is Name Data defined before as an abstract data type. The signature NAMEDATA(1, 60) specifies that a last name must consists of at least one character and must not have more than 60 characters. The abstract data type is described on page 5-14.

The lengths relevant for last names are the minimum length, the maximum length, and the average length. Minimum length and maximum length must be the same as for the signature for the data type.

There are not any value restrictions above those for name data.

The data element is a component of data group Name of Person described on the next visual. For each instance of the data group, it may assume one and only one value. Therefore, Minimum, Maximum, and Average all have the value 1. Role and Role Name

Sample Data Elements and Groups (1 of 7)

Last Name

Name Last Name

Type Data element

Description Last name of a person

Data Type NAMEDATA(1, 60)

Lengths Minimum Length: 1Maximum Length: 60Average Length: 8Number of Digits: -Decimal Places: -

Domain

Data Groups Name of Person Role: - Role Name: - Cardinality: Minimum = 1, Maximum = 1, Average = 1

Entity Types




Uempty
have not been provided. The data element plays its fundamental role and is used under its defined name (Last Name) in the data group. It is not necessary and would be repetitive to repeat the name and description of the data element.
The data element is not used as a direct attribute by any entity type.



Student Notebook


Notes:

This visual describes data group Name of Person for data element Last Name.

The data group represents the full name for a person consisting of the last name, first name, and middle initial for the person. The data inventory must contain descriptions for the appropriate data elements. We have seen the description for data element Last Name.

Items Data Type, Lengths, and Domain do not apply to data groups.

Data group Name of Person is not again a component of another data group. It is used as direct (composite) attribute by entity type EMPLOYEE. For each entity instance, it assumes one and only one value (minimum cardinality = maximum cardinality = average cardinality = 1).


Name of Person

Name Name of Person

Type Data Group

Description Full name of a person consisting of last name, first name, and middle initial

Data Type N/A

Lengths N/A

Domain N/A

Data Groups

Entity Types EMPLOYEE Role: - Role Name: - Cardinality: Minimum = 1, Maximum = 1, Average = 1




Uempty


Notes:

Data element Aircraft Number, the universal aircraft serial number for aircraft, is an alphanumeric string of 10 characters (data type ALPHANUMERIC(10, 10)). Therefore, Minimum Length, Maximum Length, and Average Length have the same value 10.

The data element is not a component of another data group. It is used as direct attributes by entity types AIRCRAFT and MAINTENANCE RECORD.

For entity type AIRCRAFT, it is the unique identifier for the various aircraft that Come Aboard owns. Since playing a single role for the entity type, its fundamental role, the data element need not be named differently. As unique identifier, the data element assumes one and only one value for every instance of entity type AIRCRAFT.

For entity type MAINTENANCE RECORD, the data element represents the aircraft serial number of the aircraft for the maintenance record. Also in this case, there is no need to rename the data element since it is used in a single role by the entity type. Its original name clearly expresses the purpose it is used for. Since every maintenance record contains one and only one aircraft number, minimum, maximum, and average cardinality are all 1.


Aircraft Number

Name Aircraft Number

Type Data Element

Description Universal serial number for aircraft

Data Type ALPHANUMERIC(10, 10)


Domain

Data Groups

Entity Types AIRCRAFT Role: - Role Name: - Cardinality: Minimum = 1, Maximum = 1, Average = 1MAINTENANCE RECORD Role: Aircraft for maintenance record Role Name: Aircraft Number Cardinality: Minimum = 1, Maximum = 1, Average = 1



Student Notebook


Notes:

This visual illustrates another data element for Come Aboard, the serial number for aircraft engines.

Data element Engine Number uses the same abstract data type as data element Aircraft Number, however, with different parameter values. Whereas aircraft serial numbers were 10 characters long, engine serial numbers may consist of 8 to 12 alphanumeric characters.

Using the same data type is perfectly all right as long as you want to allow that the various data elements can be compared with each other. If you do not want aircraft serial numbers to be compared with engine serial numbers, you should define two different abstract data types.

Engine Number is a component of data group Engine. It is not used by other data groups or directly by entity types. For each instance of data group Engine, Engine Number assumes one and only one value.


Engine Number

Name Engine Number

Type Data Element

Description Serial number for an engine of an aircraft

Data Type ALPHANUMERIC(8, 12)


Domain

Data Groups Engine Role: - Role Name: - Cardinality: Minimum = 1, Maximum = 1, Average = 1

Entity Types




Uempty


Notes:

Data Group Engine is the data group for data element Engine Number. It has additional components such as the type of the engine and information about the manufacturer for the engine. Engine is a repetitive group for entity type AIRCRAFT. This is because an aircraft may have multiple engines mounted. Consequently, for each instance of entity type AIRCRAFT, Engine may assume multiple values each of which is composed of appropriate values for the components of Engine.

The minimum cardinality of 0 signals that aircraft need not have engines mounted. The maximum cardinality of 4 specifies that an aircraft cannot have more than four engines mounted. The average cardinality of 2 indicates that, on the average, an aircraft has two engines mounted.


Engine

Name Engine

Type Data Group

Description An engine for an aircraft

Data Type N/A

Lengths N/A

Domain N/A

Data Groups

Entity Types AIRCRAFT Role: - Role Name: -

Cardinality: Minimum = 0, Maximum = 4, Average = 2



Student Notebook


Notes:

Manufacturer is a data group consisting of all information pertaining to a manufacturer. In particular, it includes:

• the manufacturer code (a unique identification for the manufacturer) • the name of the manufacturer's company • the address of the manufacturer • the phone number of the manufacturer

The address of the manufacturer is again a data group.

Data group Manufacturer is a component of data group Engine of entity type AIRCRAFT. It also is a direct attribute of entity type AIRCRAFT TYPE.

This example illustrates a hierarchy of data groups:

Address t Manufacturer t Engine

Address is a data group representing an address. Assuming that Address consists of the data elements Street Address, Post Office Box, City, State, Country, and Postal Code, the tree structure for Engine looks as follows:


Manufacturer

Name Manufacturer

Type Data Group

Description All information concerning a manufacturer (e.g., manufacturer code, company name, complete address, and phone number)

Data Type N/A

Lengths N/A

Domain N/A

Data Groups Engine Role: - Role Name: - Cardinality: Minimum = 1, Maximum = 1, Average = 1

Entity Types AIRCRAFT TYPE Role: - Role Name: - Cardinality: Minimum = 1, Maximum = 1, Average = 1




Uempty
Engine Engine NumberEngine Type Manufacturer
Manufacturer Code Company Name Address

Street Post Office Box City State Country Postal Code

Phone Number

The names of data groups are shown in bold. Indentation indicates the next level of the tree structure. Items with the same indentation are on the same level of the tree structure.

Since data group Engine is used as a composite attribute by entity type AIRCRAFT, the data elements at the terminal nodes become elementary attributes of entity type AIRCRAFT.



Student Notebook


Notes:

In the illustrated example, data element Number of Engines is associated with standard data type INTEGER which is not parameterized. This means that a value range cannot be specified for the data type. However, the minimum value that Number of Engines can assume is 0. The maximum value is 4. To indicate this, you can use a domain specification as done in the example. The domain specification must be implemented by database functions such as check constraints, if available, or by checking user input.


Number of Engines

Name Number of Engines

Type Data Element

Description Number of engines that an aircraft model may have

Data Type INTEGER

Lengths Determined by domain

Domain 0 - 4

Data Groups

Entity Types AIRCRAFT TYPE Role: - Role Name: - Cardinality: Minimum = 1, Maximum = 1, Average = 1




Uempty

Figure 5-18. Methods for Establishing a Data Inventory CF182.0

Notes:

There are many ways to establish a data inventory. However, there are three methods that are normally considered when creating a data inventory:

• You can survey the departments of expertise and ask their members for the data elements and data groups needed by the application domain.

• If available, you can review existing data (in files or databases) and programs to determine the data elements and data groups needed for the application domain.

• You can develop the data inventory in parallel with the process inventory which describes the business processes for the application domain. As a business process is described, the data elements and data groups it uses become apparent.

You can use one of these methods or a combination thereof. We will discuss the advantages and the disadvantages of these methods in the following.

Methods for Establishing a Data Inventory

Survey of departments of expertise

Review of existing data and programs

Parallel development with process inventory

Use a single method or a combination thereof



Student Notebook

Figure 5-19. Survey of Departments of Expertise CF182.0

Notes:

This method suggests that the application domain experts asks the members of the departments of expertise for the data needed by their tasks.

The quality of the result depends on several communicative factors:

• It depends on the ability of the domain expert to extract the proper information from the members of the departments of expertise. From the discussions in the unit so far, he/she should know which information is needed. However, the answers received are frequently not very well structured. They must be scrutinized and filtered to reveal the actual facts.

• It depends on the ability of the interviewer to tell the application domain expert precisely what is needed. The members of the departments of expertise are not computer experts. Frequently, they do not have a feeling for the information needed.

• It depends on the willingness of the members of the departments of expertise to cooperate with the domain expert. Since they do not see a direct benefit for them, they may find it tiresome and annoying to be involved in the interviews. Their willingness largely depends on the pressure they are under as a consequence of their actual work.

Survey of Departments of Expertise

Application domain experts asks members of departments of expertise for data needed

Results depend on:Ability of domain expert to extract information from members of departments

Ability of members of departments to communicate their expertise

Willingness of members of departments to cooperate with application domain expert

Easy to forget something

Easily results in superfluous data elements and data groups in data inventory

One-time effort: later changes not reflected in data inventory

Only auxiliary method




Uempty
Even if the above-mentioned problems do not occur, there are some other pitfalls with this technique. The approach is fairly unstructured and, during a discussion, it is very easy to forget something as you probably know from your own experience.
Conversely, during a discussion, things may surface that are on the mind of the interviewee or in his/her fantasy rather than being facts. This may lead to superfluous data elements and data groups in the data inventory and, thus, unnecessary fields in the database being designed.

A further disadvantage of this method is that it is a one-time effort. Consequently, later changes (e.g., new data elements or extensions of data groups) are not reflected in the data inventory.

Summing it up, surveying the departments of expertise is rather an auxiliary method than the method to be used. Together with other methods, it may be quite helpful, especially, since it promotes the contact to the members of the departments of expertise, the actual "customers" of the database being developed.



Student Notebook

Figure 5-20. Review of Existing Data and Programs CF182.0

Notes:

This method screens existing data files (which may be on paper, in flat files, or in old databases) and program listings for data used by the application domain. From the data found, the data elements or data groups for the application domain are derived and registered in the data inventory.

For a large application domain, a great number and variety of files and documents may have to be inspected. This is not a problem as such because, whatever you do to come to a data inventory, it will cost you quite some effort; otherwise, the data inventory will be incomplete. The success depends on the availability and the quality of the documentation of the data and the programs. The poorer the documentation, the more effort you must put in.

The amount of information to be scanned may cause potential data elements or data groups to be overlooked. Conversely, you may find data elements or data groups that are not really objects of the application domain, but caused by the particular implementation used so far. You do not want these data elements and data groups in the data inventory.

Review of Existing Data and Programs

Existing data files (on paper, in flat files, etc.) and program listings are screened for data of the application domain

May have to investigate a great variety of files and documents

Result depends on quality of documentation of data and programs

May easily overlook some data

Must check if data found are:

Relevant for application domainImplementation-dependent

Feasible, but beware of implementation dependent data elements or data groups




Uempty
Summarizing the preceding points, we can say that this is a feasible method provided the required information is available. However, you must be wary of implementation-dependent data elements or data groups and ignore them.


Student Notebook

Figure 5-21. Coupling of Data and Process Inventories CF182.0

Notes:

The method discussed on this visual synchronizes the data inventory with the process inventory. It couples the development and maintenance of the data and process inventories. The process inventory contains a detailed description of the business processes for the application domain. It is input for application programmers and enables them to write the programs supporting the application domain. As we will see later in this unit, the description of the business processes includes, for each business process, a list of the data elements and data groups used.

As a business process is described or changed, the affected data elements or data groups are described or their description is updated in the data inventory. Thus, the process inventory and the data inventory remain synchronized at all times. If processes require new data elements or data groups, they are associated with the proper entity types as described before. If an entity type cannot be found for a data element or data group, the entity-relationship model must be changed as well. As you can imagine, this will result in changes for your database.

Coupling of Data and Process Inventories

Jointly develop and maintain process and data inventories

Process Inventory

Data Inventory

Process inventory contains a description of all business processes for application domain

For each business process, process inventory lists all data used by process

Data elements and data groups identified for processes are described in data inventory

Including role of data element or data group

If process changes, data inventory is updated

Responsibility where it belongs to:

With processes

Only data needed by processes in data inventory




Uempty
The advantage of this approach is that it leaves the responsibility for the data elements and data groups where it belongs to, namely, with the business processes. As a consequence, the data inventory contains the data elements and data groups for the documented processes and only for those. These may be existing or planned business processes. A positive side effect is that the planned business processes must have materialized at least so far that they have been documented; that they no longer are some vague ideas in some people's head that never are realized.
The discussed method ensures that all needed data elements and data groups are in the data inventory and, thus, will be in the database.



Student Notebook




Uempty
5.2 Process Inventory


Student Notebook

Figure 5-22. Process Inventory - Purpose and Responsibilities CF182.0

Notes:

As already mentioned before, the process inventory contains a detailed description of all business processes for the application domain. The descriptions should be completely business oriented. They should be independent of any implementation considerations.

The process inventory is established by the application domain expert because he/she has the overall knowledge of the application domain required. Of course, he/she needs to discuss the business processes and verify their descriptions with the departments of expertise. Whereas the members of the departments of expertise are frequently not willing to discuss the data elements or data groups, they generally are interested in talking about the business processes. The reason is that the business processes represent their daily work. They want to ensure that their implementation makes their work as easy as possible.

The process inventory is input for the application programmers. It must allow them to understand the business processes for the application domain and to develop the required programs, queries, etc.

The descriptions for the business processes must identify all data elements and data groups for the business processes. Since the data elements and data groups are process

Process Inventory - Purpose and Responsibilities

Database designer not involved

Detailed description of all business processes for application domain

Strictly business orientedImplementation independent

Created by application domain expert

Input for application programmers

Must identify all data for processesReferences to data elements and data groups in data inventoryData inventory also input for application programmers

Allows to verify entity-relationship model

Understand processes for application domain

Must allow application programmers to:

Develop programs for processes




Uempty
independent, they are not described in the process inventory, but in the data inventory. The business processes only refer to the data elements and data groups in data inventory. Therefore, the data inventory is also input for the application programmers.
When describing a business process, the application domain expert should verify that the entity-relationship model contains all entity types and relationship types necessary for the implementation of the business process. We will discuss this in more detail on one of the subsequent visuals.

Except for assisting the application domain expert in verifying the entity-relationship model, the data base designer is not involved in the establishment of the process inventory.



Student Notebook

Figure 5-23. Contents for a Business Process (1 of 2) CF182.0

Notes:

For each business process, the process inventory should contain the following items:

Title

The unique title under which the business process is known throughout the application domain. The business process should be easily recognizable from the title.

Purpose

A short description of the purpose of the business process, i.e., an outline what, from a business perspective, the business process is supposed to achieve.

Input

A description of all data, including their role, that are external input for the business process. This means a description of all data that are perceived as input by the (end) users of the business process. In particular, these may be data entered by them in entry fields or selected via check boxes, radio buttons, or combination boxes.

Contents for a Business Process (1 of 2)

Title: The unique title under which the business process is known throughout the application domain

Purpose: A short description of the purpose of the business process for the application domain

Textual Description:

Needed for departments of expertise and end users

A detailed textual description of the steps of the business process

Formal Description:

A formal description of the conditions, rules, and actions for the business process

For application programmersFor example, decision tables

Input: A description of all data, including their role, being external input for the business process

For example, data entered by the end userFor example, aircraft number for maintenance record and not just aircraft number




Uempty
As mentioned, for each input, its role should be identified. For example, the description should not just say "aircraft number", but rather "aircraft number for the maintenance record of the specified aircraft". This is important for the application programmers for two reasons:
1. They may have to provide an appropriate description for the corresponding input field on a window or in help information.

2. They need to know which data to access. As you know already from the entity-relationship model, the aircraft number may occur in multiple entity types and, thus, later on, in multiple tables.

Textual Description

A detailed textual description of the various steps of the business process. A textual description is necessary since the description must be verified by the departments of expertise and will be available to the users of the business process. Generally, a formal description of the business process is not understood by these people. The textual description also helps the database designer when verifying design steps by means of the business processes described in the process inventory.

Formal Description

A formal description (for example, by means of decision tables) of the conditions, rules, and actions for the business process. Application programmers may prefer such a formal description over a textual description because it is more precise and, thus, eases their task.



Student Notebook

Figure 5-24. Contents for a Business Process (2 of 2) CF182.0

Notes:

In addition to the items on the previous visual, the description for a business process should contain the following items:

Output

A description of all data, including their role, which are external output for the business process, i.e., perceived as output by the users of the business process. In particular, these may be data displayed in a window or in a listing. It may also be something as abstract as an interrelationship established by the business process (e.g., the assignment of an aircraft to a flight) or a message.

Furthermore, the output may be conditional. This means, it can depend on the input provided for the business process and on situations encountered during its execution.

As mentioned, for each output, its role should be identified as far as applicable. For example, the description should not just say "airport code", but rather "airport code for airport of departure for leg". Application programmers need this information to properly describe the output on windows or in listings.

Contents for a Business Process (2 of 2)

Others: Other items needed by application programmers such as window formats or listing formats

Output: A description of all data, including their role, which are external output for the business process

For example, data displayed on a screen or in a listing

For example, airport code for airport of departure for leg and not just airport code

Data Read: A description of all data elements or data groups internally read by the business process

For each data element or data group, provide:

Its name in the data inventory

All purposes for which it is read (roles)

Data Written: A description of all data elements or data groups internally written by the business process

For each data element or data group provide:

Its name in the data inventory

All purposes for which it is written (roles)




Uempty
Data Read
A detailed description of all data elements or data groups read internally when the business process is executed. For each data element or data group, provide its name in the data inventory and all purposes it is read for by the business process. The purposes identify the roles the data element or data group plays for the business process. The roles/purposes are important for the correct assignment of the data element or data group to entity types in the data inventory.

Data Written

A detailed description of all data elements or data groups written internally when the business process is executed. For each data element or data group, provide its name in the data inventory and all purposes it is written for by the business process. The purposes identify the roles the data element or data group plays for the business process. The roles/purposes are important for the correct assignment of the data element or data group to entity types in the data inventory.

Others

The description may contain many other items such as window or listing formats. However, these are not of immediate interest for the database designer and, therefore, not discussed here.



Student Notebook

Figure 5-25. Sample Business Process (1 of 5) CF182.0

Notes:

The next few visuals show the description of a business process for our sample airline company called Come Aboard. The business process assigns a pilot as captain to a flight. Appropriately enough, the unique title under which the business process is known throughout the application domain is Assign Captain for Flight.

Item Purpose explains in more detail what the business process will accomplish: It will assign the specified pilot to the specified flight.

The input for the business process must identify the flight as well as the pilot who becomes the captain for the flight. To identify the flight, the fight number, the airport of departure, the airport of arrival, and the locator for the flight must be provided. Note that the addendum for flight in the visual identifies the role of the input. This is important because flight number, airport of departure, and airport arrival can be used to identify different things (e.g., legs rather than flights).

Sample Business Process (1 of 5)

Identifies flight

Flight number for flight

Airport of departure for flight

Airport of arrival for flight

Flight locator for flight

Employee number for pilot to be assigned to flight

Assign the specified pilot as captain to the specified flight.

Input:

Purpose:

Assign Captain for Flight

Business Processes

Identifies pilot




Uempty


Notes:

This visual lists the individual steps that, from a business perspective, must be performed by the business process. It does not describe an implementation. The implementation may look completely different and even will in this case: It can make of use of two constraints of the entity-relationship model provided these are implemented. As a consequence of the constraints, a lot of the checking for the business process need not be implemented since it is handled by the constraints.

Because the description is pretty intelligible, we need not discuss it further here.


A message is displayed confirming that the pilot has been assigned as captain to the flight. The message includes employee number, last name, and first name of the assigned captain.

6

This business process performs the following operations:Textual Description: 1 It is verified that the specified flight and pilot exist.

If flight or pilot do not exist, an appropriate error message is displayed and the business process ends.

If the pilot has not yet been assigned to the flight, it is checked if another pilot is already captain for the flight.

If so, a message is displayed containing employee number, last name, and first name of the current captain and the business process ends.

4

If a captain has not yet been assigned to the flight, the specified pilot becomes the captain for the flight.

5

If pilot and flight exist, it is checked if the pilot has the license to fly the aircraft model for the leg for the flight.

If the pilot cannot fly the aircraft model, an appropriate error message is displayed and the business process ends.

2

If the pilot has the license to fly the aircraft model, it is checked if the pilot has already been assigned to the flight.

If the pilot is already captain or copilot for the flight, an appropriate message is displayed and the business process ends.

3



Student Notebook


Notes:

This visual illustrates the external output for the sample business process assigning a pilot as captain to a flight.

The flight information, i.e, the flight number, the airport of departure, the airport of arrival, and the locator for the flight are always returned as output. They also were input for the business process. Again, the addendum for flight on the visual indicates the role of the output.

The further output is dependent on conditions encountered during the execution of the business process:

If another pilot has already been assigned as captain to the flight, employee number, last name, and first name of that pilot and the employee number of the specified pilot are returned. (As a consequence, the specified pilot was not assigned to the flight.)

If the specified pilot has been assigned as captain to the flight, his/her employee number, last name, and first name are returned. In addition a message is issued that the pilot has


If another pilot has already been assigned as captain to the flight:

Employee number of currently assigned captain

Last name of currently assigned captain

First name of currently assigned captain

Employee number of pilot not assigned to flight

Flight number for flight

Airport of departure for flight

Airport of arrival for flight

Flight locator for flight

Output:

If the specified pilot has been assigned as captain to the flight:

Pilot assigned as captain to flight

Employee number of newly assigned captain

Last name of newly assigned captain

First name of newly assigned captain




Uempty
been assigned successfully. Accordingly, the fact that the pilot has been assigned to the flight is perceived as an output by the user of the business process.
You could think of further conditions resulting in different output. Such a condition is that the pilot has already been assigned as copilot to the flight. These conditions and their output should be described as well. We have not done this here to keep the output for the sample business process on a single visual.



Student Notebook

Figure 5-28. A Walk Through the ER Model CF182.0

Notes:

For each business process, you should verify the entity-relationship model for the application domain. You should check if it contains all required entity types and relationship types by scrutinizing all steps of the business process.

To determine the entity types needed, you must determine the data elements and data groups used by the steps. Thus, in the course of the verification, you determine all data elements and data groups read or written by the business process.

When verifying the entity-relationship model for a business process, you perform a walk through the entity-relationship model and determine the view needed for the business process.

We will do this now for the sample business process assigning the captain for a flight. The steps of the business process have been described on page 5-47. They will be repeated here as far as required for the understanding:

1. The first step of the business process verifies that the specified pilot and flight exist. If not, an appropriate message is displayed and the business process ends.

_by_DC

1. .1_assigned

_to_

m

m

DC 1 1

S1

_is_

DC

m

m

_can_fly_

mD

_for_m

1. .1

_for_

EMPLOYEE

_for_

m

1m

1. .m

_in_

1. .1

m

_from_

m

m

_scheduled_for_

m m

_can_land_at_

m

m

_trained_for_

m

1. .1

_for_

From To

m m

_nonstop_to_

D 1. .m_for_

_belongs_to_

Owner

m 1

C

PILOT

ITINERARY

AIRPORT

MECHANIC

MAINTENANCE RECORD

AIRCRAFT

AIRCRAFT MODEL

AIRCRAFT TYPE

_as_

1. .1

DC

FLIGHT

A Walk Through the ER Model

LEG

PILOT ASSIGNMENT




Uempty
The specified flight exists if entity type FLIGHT contains an entity instance for the specified flight number, airport of departure, airport of arrival, and flight locator. Thus, entity type FLIGHT must have attributes for the following data elements:
Their roles are specified in parentheses. As you can see, data element Airport Code is used in two roles. Logically, these attributes are read from entity type FLIGHT to verify that the specified flight exists.

The specified pilot exists if entity type PILOT contains an entity instance for the specified employee number. Thus, entity type PILOT must have an attribute for data element Employee Number:

As we know, this attribute is the entity key.

As you would expect, for entity type PILOT, the data element plays the role of employee number for a pilot . It is immaterial that the business process reads the employee number for the pilot specified as input since being specified as input does not constitute a characteristic of pilots. It only limits the entity instances read.

2. The second step of the business process checks if the specified pilot has the license to fly the aircraft model for the leg for the flight. If he/she does not have the license, an appropriate message is displayed and the business process ends.

To determine if the pilot has the required license for the flight, we must first determine the leg for the flight and the aircraft model for the leg. Then, we must see if the specified pilot has the license to fly the aircraft model we have determined. Using the entity-relationship model, we can accomplished this by:

•Navigating from entity type FLIGHT to entity type LEG via relationship type FLIGHT_for_LEG to determine the leg for the flight.

•Navigating from entity type LEG to entity type AIRCRAFT MODEL via relationship type AIRCRAFT MODEL_for_LEG to find the aircraft model for the leg for the flight.

•Checking if relationship type PILOT_can_fly_AIRCRAFT MODEL contains an instance for the specified pilot and the aircraft model just determined.

Entity Type Data Element/Data Group

FLIGHT Flight Number (flight number for a flight)

Airport Code (airport of departure for a flight)

Airport Code (airport of arrival for a flight)

Flight Locator (flight locator for a flight)


PILOT Employee Number (employee number for a pilot)



Student Notebook

Thus, the entity-relationship model includes all entity types and relationship types required for the step.

Accessing a relationship type means accessing its defining attributes since they completely describe the relationship instances. As we know, the defining attributes are the keys of the source and target for the relationship type. Consequently, source and target of the relationship type are the primary receptacles for the data elements and data groups corresponding to the defining attributes. If they do not contain them, the relationship type cannot contain them. If they are their keys, the relationship type will automatically contain them. Therefore, in the data inventory, the data elements/data groups for the accessed defining attributes are associated with the source and target entity types rather than with the relationship type.

In view of this convention, the walk through the entity-relationship model for this step of the business process requires the following data elements for the indicated entity types. The roles are included in parentheses:

The data elements for entity types FLIGHT and PILOT have already been identified for the previous step and are not repeated in the data inventory. For the roles, similar considerations apply as for the first step.

3. The third step of the business process checks if the specified pilot has already been assigned to the flight. If he/she has already been assigned to the flight, an appropriate message is displayed and the process ends.

It does not matter whether the specified pilot has been assigned as captain or copilot. In both cases, he cannot be assigned again to the flight.

In the entity-relationship model, relationship type PILOT_assigned_to_FLIGHT must be used to determine if the pilot has already been assigned to the flight. Since it is not of





LEG Flight Number (flight number for a leg)

Airport Code (airport of departure for a leg)

Airport Code (airport of arrival for a leg)

AIRCRAFT MODEL Type Code (type code for an aircraft model)

Model Number (model number for an aircraft model)





Uempty
interest whether the pilot has been assigned as captain or copilot, entity type PILOT ASSIGNMENT is not needed.
Accordingly, this step of the business process uses the following data elements in the indicated entity types:

All data elements and roles have already been identified before. Therefore, they are not repeated in the data inventory.

4. The fourth step of the business process checks if another pilot is already captain for the flight. It so, a message is displayed, containing employee number, last name, and first name of that pilot, and the business process ends.

Using the entity-relationship model, you can accomplish this by navigating from entity type FLIGHT to entity type PILOT ASSIGNMENT via relationship types PILOT_assigned_to_FLIGHT and PILOT_assigned_to_FLIGHT_by PILOT ASSIGNMENT and inspecting the function for the pilot assignments.

Following the above convention concerning relationship types, this traversal of the entity-relationship model requires that the following data elements have attributes in the indicated entity types:















Student Notebook

All data elements and roles for entity types FLIGHT and PILOT have been identified before. They are not repeated in the data inventory.

If another captain has already been assigned to the flight, you must navigate from entity type PILOT ASSIGNMENT to entity type PILOT via relationship types PILOT_assigned_to_FLIGHT_by_PILOT ASSIGNMENT and PILOT_assigned_to_FLIGHT. Using relationship type EMPLOYEE_is_PILOT, you must continue on to entity type EMPLOYEE. There, you find the last name and first name of the current captain for the flight needed for the message being issued.

This requires the following additional data elements for the indicated entity types:

5. In the fifth step of the business process, the specified pilot becomes the captain of the specified flight.

For the entity-relationship model this means that instances must be added to entity type PILOT ASSIGNMENT and relationship types PILOT_assigned_to_FLIGHT and PILOT_assigned_to_FLIGHT_by_PILOT_ASSIGNMENT.

PILOT ASSIGNMENT

Flight Number (flight number for a pilot assignment)

Airport Code (airport of departure for a pilot assignment)

Airport Code (airport of arrival for a pilot assignment)

Flight Locator (flight locator for a pilot assignment)

Employee Number (employee number for a pilot assignment)

Pilot Function (pilot function for a pilot assignment)


EMPLOYEE Employee Number (employee number for an employee)

Last Name (last name of an employee as part of data group Name of Person used by entity type EMPLOYEE)

First Name (first name of an employee as part of data group Name of Person used by entity type EMPLOYEE)





Uempty
As a consequence, the following data elements in entity type PILOT ASSIGNMENT are written (see also page 5-57):
The fact that instances are added to relationship types PILOT_assigned_to_FLIGHT and PILOT_assigned_to_FLIGHT_by_PILOT_ASSIGNMENT does not imply that the data elements for their defining attributes are updated in their sources and targets. Therefore, they will not be listed under Data Written in the description of the business process. In contrast, you may claim that they must be contained in the respective source or target and should be listed under Data Read.

6. The sixth step displays the message confirming that the pilot has been assigned as captain to the flight. The message includes employee number, last name, and first name of the newly assigned captain.

This requires us to access entity type EMPLOYEE for the pilot being assigned to obtain his/her last name and first name.

As a consequence, the following data elements are needed (read) in the indicated entity types:

All data elements, data groups, and roles have been identified before and, therefore, are not repeated in the data inventory.

We could successfully identify all required entity types and relationship types for the business process and assign all data elements or data groups to entity types. Thus, the entity-relationship model is complete for this business process.


PILOT ASSIGNMENT

Flight Number (flight number for a pilot assignment)

Airport Code (airport of departure for a pilot assignment)

Airport Code (airport of arrival for a pilot assignment)

Flight Locator (flight locator for a pilot assignment)

Employee Number (employee number for a pilot assignment)

Pilot Function (pilot function for a pilot assignment)


EMPLOYEE Employee Number (employee number for an employee)

Last Name (last name of an employee as part of data group Name of Person used by entity type EMPLOYEE)

First Name (first name of an employee as part of data group Name of Person used by entity type EMPLOYEE)



Student Notebook


Notes:

The data elements and data groups on this visual have already been discussed in the notes for the previous visual on page 5-50.

Note that column Contained In is not part of the description for a business process. It has been added here to indicate the data groups and entity types the various data elements/data groups will be associated with in the data inventory. It does not make sense to describe this in the process inventory. The implementation of the business processes must be based on the actual tables rather than on the entity types of the entity-relationship model.


Data Read:

Element/Group Role/Purpose Contained In

Flight LocatorPILOT ASSIGNMENTFlight locator for a pilot assignmentFLIGHTFlight locator for a flight

Airport Code FLIGHTAirport of departure for a flightFLIGHTAirport of arrival for a flightLEGAirport of departure for a legLEGAirport of arrival for a legPILOT ASSIGNMENTAirport of departure for a pilot assignmentPILOT ASSIGNMENTAirport of arrival for a pilot assignment

Pilot Function PILOT ASSIGNMENTPilot function for a pilot assignment

Type Code AIRCRAFT MODELType code for an aircraft model

Model Number AIRCRAFT MODELModel number for an aircraft model

Flight Number FLIGHTFlight number for a flightLEGFlight number for a legPILOT ASSIGNMENTFlight number for a pilot assignment

Last Name Name of PersonEMPLOYEE

Last name of an employee

First Name First name of an employee Name of PersonEMPLOYEE

Employee Number

Employee number for an employee EMPLOYEE

Employee number for a pilot PILOTEmployee number for a pilot assignment PILOT ASSIGNMENT




Uempty


Notes:

This visual illustrates the data elements written by the sample business process. They have already been discussed on page 5-50 ff..

Note that column Contained In is not part of the description for a business process. It has been added here to indicate the data groups and entity types the various data elements/data groups will be associated with in the data inventory.


PILOT ASSIGNMENTFlight number for a pilot assignmentFlight Number

Airport of departure for a pilot assignment PILOT ASSIGNMENT

Airport of arrival for a pilot assignment PILOT ASSIGNMENTAirport Code

PILOT ASSIGNMENTFlight locator for a pilot assignmentFlight Locator

PILOT ASSIGNMENTEmployee number for a pilot assignmentEmployee Number

Pilot Function Pilot function for a pilot assignment PILOT ASSIGNMENT

Data Written:

Element/Group Role/Purpose Contained In



Student Notebook

Figure 5-31. Process Decomposition CF182.0

Notes:

To ensure the completeness of the data inventory, you need a comprehensive process inventory. This requires that you have a complete set of the business processes for the application domain, i.e., the processes (tasks) actually performed by the application domain.

One technique for obtaining a comprehensive set of business processes is process decomposition. It is a step-by-step decomposition of the application domain into groups of related business processes and, finally, individual business processes.

Process decomposition is an iterative process. The next iteration is a refinement of the previous iteration and creates business-related subgroups (subsets) for the groups resulting from the previous iteration. The iteration stops when a group finally consists of a single business process.

Process decomposition is a pure grouping of the business processes based on the tasks performed by the application domain. It just describes which business processes are performed by a the various subfunctions of the application domain. The same business process may be performed by multiple subfunctions.

Process Decomposition

Iteration stops if group consists of a single business process

Independent of whether or not the business process will employ other business processes to achieve its task (implementation detail)

Business process then described in process inventory

Step-by-step decomposition of application domain into groups of related business processes and, finally, individual business processes

Next iteration is a refinement of the previous iteration

Next iteration creates business-related subsets of groups for previous iteration

To attain a complete set of business processes for the application domain

Business process = Process actually performed by the application domain

Result is a process tree

Lowest level are business processes to be described in data inventory

Higher levels are groups of related business processes

Only a grouping of business processes

Does not specify if a business process internally uses another business process to accomplish its task

Does not imply an implementation structure




Uempty
Process decomposition neither considers nor reflects whether or not a business process internally uses other business processes to perform its work. For example, the business process displaying all maintenance records for an aircraft may very well use the business process displaying an individual maintenance record. However, this is not a concern of process decomposition and not reflected in its output.
Neither does process decomposition occupy itself with modules internally used or invocation sequences. These are implementation details. Only externally visible tasks, i.e., tasks performed by the application domain, are considered and reflected. Remember that we still are in the conceptual view. At this stage, you should not make any assumptions about the implementation of the business processes.

As a business process is identified during process decomposition, it is described in the process inventory.

The result of process decomposition is a process tree. The nodes at the lowest level of the process tree are the business processes described in the process inventory. The nodes at the higher levels are groups of related business processes. They act like folders of directory structures. The process tree groups the business processes in accordance with their usage by subfunctions of the application domain.

The process tree does not imply an implementation structure or an invocation sequence. It does not specify if a business process internally uses another business process to accomplish a task. It neither establishes nor enforces that separate business processes become separate programs or queries.

The process tree should be incorporated in the process inventory . It provides an overview of the business processes described in the process inventory.



Student Notebook

Figure 5-32. Process Decomposition for CAB (1 of 2) CF182.0

Notes:

This visual illustrates a part of the process tree for our sample airline company called Come Aboard.

We have used folders for the higher-level nodes to demonstrate the similarity to directory structures on personal computers. A folder contains a list of items. In our case, these are business processes or other folders containing business processes.

The top-level folder (node), called Aviation, represents the entire application domain. It covers all business processes for the application domain.

The first iteration of the process decomposition resulted into groups of business processes for subfunctions Airport Management, Itinerary Management, Flight Management, Aircraft Management, and Aircraft Maintenance. Some additional subfunctions (such as Employee Management) are not shown on the visual.

The visual shows a second iteration for Flight Management resulting in groups of business processes for subfunctions Aircraft Assignment and Pilot Assignment. The business processes for these groups (third iteration) are also listed on the visual.

Display Flights for Aircraft

Assign Aircraft to Flight

Change Aircraft for Flight

Remove Aircraft for Flight

Display Aircraft for Flight

Display Aircraft Model for Flight

Display Aircraft Type for Flight

Display All Aircraft Information for Flight


Change Captain for Flight

Remove Captain for Flight

Assign Copilot for Flight

Change Copilot for Flight

Remove Copilot for Flight

Display Pilots for Flight

Display Flights for Pilot

Process Decomposition for CAB (1 of 2)

PILOT ASSIGNMENT

AIRCRAFT ASSIGNMENT

AIRPORT MANAGEMENT

ITINERARY MANAGEMENT

FLIGHT MANAGEMENT

AIRCRAFT MANAGEMENT

AIRCRAFT MAINTENANCE

AVIATION




Uempty
Many business processes access a single business object type or business relationship type or, if you prefer, a single entity type or relationship type. However, there are also business processes accessing multiple entity types and/or relationship types.
Note that business process Display All Aircraft Information for Flight might invoke business processes Display Aircraft for Flight, Display Aircraft Model for Flight, and Display Aircraft Type for Flight. Since this is an implementation detail, the process tree does not show it. The actual implementation may look different.

Business processes Display Flights for Pilot and Display Flights for Aircraft may very well be used by other subfunctions of the application domain as well. The first business process may be used by Employee Management; the second by Aircraft Maintenance. The unique title for a business process prevents that it is implemented twice. You can view the business process as belonging to one subfunction (its major user) and the other subfunctions having shortcuts for it.



Student Notebook

Figure 5-33. Process Decomposition for CAB (2 of 2) CF182.0

Notes:

The next iteration for subfunction Itinerary Management does not result in groups of business processes. Rather, it immediately provides the business processes for Itinerary Management. This illustrates that the number of iterations required may vary for different parts of the process tree.

This part of the process tree also shows a business process, Display All Legs of Itinerary, whose implementation might use another business process (Display Single Leg of Itinerary). Again, the process tree is not supposed to show such implementation details.

Create Itinerary

Change Itinerary

Remove Itinerary

Display Itinerary

Add Single Leg to Itinerary

Add All Legs for Itinerary

Change Leg of Itinerary

Remove Leg of Itinerary

Display Single Leg of Itinerary

Display All Legs of Itinerary

Change Aircraft Model for Leg of Itinerary

Display Flights for Leg of Itinerary

Process Decomposition for CAB (2 of 2)

AIRPORT MANAGEMENT

ITINERARY MANAGEMENT

FLIGHT MANAGEMENT

AIRCRAFT MANAGEMENT

AIRCRAFT MAINTENANCE

AVIATION




Uempty
Checkpoint


a. The data inventory describes all data for the application domain.

b. The data inventory is jointly established by the application domain expert and the database designer.

c. The database designer is not involved at all in the development of the data inventory.

d. The data inventory should also describe data caused by the implementation of the business processes, but not having a business meaning.

e. The data inventory is input both for the database designer and application programmers.

f. When establishing the data inventory, the entity-relationship model is checked for completeness.

2. List the components of a data inventory.

_____________________________________________________

_____________________________________________________

_____________________________________________________

3. Describe the difference between a data element and a data group.

_____________________________________________________

_____________________________________________________

_____________________________________________________

4. What is the purpose of abstract data types?

_____________________________________________________

_____________________________________________________

_____________________________________________________



Student Notebook

5. Name the three items that should be described for an abstract data type.

_____________________________________________________

_____________________________________________________

_____________________________________________________

6. Which of the following items should you specify for a data group?

a. A unique name.

b. A textual description.

c. Its data type.

d. The data groups using it as components.

e. The entity types using it as attributes.

f. Its minimum, and average lengths.

g. A domain for its values.

7. Why should you associate data elements or data groups with entity types when adding them to the data inventory?

_____________________________________________________

_____________________________________________________

_____________________________________________________

8. Name two methods for establishing a data inventory.

_____________________________________________________

_____________________________________________________

_____________________________________________________

9. List some of the problems with the method of surveying the departments of expertise.

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________




Uempty
10.Which principle is behind coupling the data and process inventories?
_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________


a. The process inventory describes all data for the application domain.

b. The process inventory describes all business process for the application domain.

c. The process inventory is input for the database designer.

d. The descriptions of the business processes refer to the data elements and data groups of the data inventory by their unique names.

e. The business processes can be used to verify the completeness of the entity-relationship model.

12.Name at least six items that the description for a business process should include.

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

13.What is to be understood by data read for a business process? Which information should be provided for data read?

_____________________________________________________

_____________________________________________________

_____________________________________________________



Student Notebook

14.How can you use a business process to verify the entity-relationship model.

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

15.What is process decomposition and what is its purpose?

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________




Uempty


Notes:


DATA

INVENTORY

Description of abstract data types includes signature, values, and operations for data type

A detailed description of all abstract data types, data elements, and data groups for the application domain

Created jointly by application domain expert and database designer

Methods for establishing a data inventory:

Survey of departments of expertise

Review of existing data and programs

Parallel development with process inventory

Name, type, textual description, owning data groups and entity types

Additionally, for data elements: data type, lengths, and domain

Description of data elements/data groups includes:

Allows verification of entity-relationship model



Student Notebook


Notes:


PROCESS

INVENTORY

Contains process tree and descriptions for business processes

A detailed description of all business processes for application domain

Created by application domain expert for application programmers

Descriptions for business processes include title, purpose, input, textual and formal descriptions, output, data read, data written

Allows verification of entity-relationship model

Process tree established by means of process decomposition for application domain

Functionally structures business processes for application domain

Step-by-step process providing groups of functionally related business processes

Provides a complete set of business processes




Uempty
Unit 6. Tuple Types

This unit describes the purpose of tuple types and how to establish them for entity types and relationship types of an entity-relationship model.



• Explain the purpose of tuple types and position them in the design process.

• Identify the objects of an entity-relationship model for which tuple types are established.

• Establish the tuple types for the appropriate objects of an entity-relationship model.

• Explain the purpose and rules for the normalization of tuple types.

• Normalize the tuple types for an application domain.


Accountability:



© Copyright IBM Corp. 2000, 2002 Unit 6. Tuple Types 6-1

Student Notebook


Notes:

Up to now, the entity-relationship model and the data and process inventories for the application domain have been established. Now, it is time to transform the information collected so far into objects that are machine processable. This requires a sequence of steps. The first step is to establish the tuple types for the application domain and to normalize them.

In this unit, we will talk about the purpose of tuple types, describe for which objects of the entity-relationship model they are established, and how they are established.

The tuple types established this way may contain anomalies and redundant information and are submitted to a process called Normalization. We will talk about the purpose of normalization and Normal Forms.

Thus, after the completion of this unit, you should be able to establish the tuple types for an application domain and to normalize them.

Unit Objectives


Establish the tuple types for the appropriate objects of an ER model

Explain the purpose of tuple types and position them in the design process

Identify the objects of an ER model for which tuple types are established

Normalize the tuple types for an application domain

Explain the purpose and rules for the normalization of tuple types




Uempty
6.1 Establishing Tuple Types


Student Notebook

Figure 6-2. Tuple Types in Design Process CF182.0

Notes:

As part of the conceptual view, the entity-relationship model and the data and process inventories for the application domain were established. The first step of the storage view uses the data inventory to construct tuple types for the entity types and relationship types of the entity-relationship model and normalizes them.

Tuple types are an intermediate result of the design process. They are the precursors of tables and provide the basis for the computerized processing of the entity types and relationship types for the application domain. They are part of storage view since they represent the first step in the physical implementation of the conceptual view.

You can view the design process as a layered approach transforming the objects of the application domain step-by-step into more and more physical representations. Tuple types are an intermediate result of this transformation process which, finally, results in the tables and related objects of the target relational database management system.

Tuple Types in Design Process

ConceptualView


Data Inventory

Tuple Types

Process Inventory

Tables

Indexes


Integrity Rules

Problem Statement





Uempty

Figure 6-3. Tuple Types CF182.0

Notes:

Tuple types are the first result of storage view. They are established when the entity-relationship model is transformed step-by-step into the physical objects for the target relational database management system. They are not yet the tables for the target system. They are an intermediate result.

Similarly to entity types, tuple types are constructs representing classes of objects with the same meaning, structure, and characteristics. As entity types, they consist of attributes which may be elementary or composite and can assume zero, one, or multiple values. Nevertheless, they are not entity types. They rather could be seen as a generalization or standardization of entity types and relationship types.

Tuple types form the basis for the computerized processing of the objects they represent. Whereas entity types and relationship types were purely conceptual classes, tuple types should be seen as semi-physical constructs. They can be compared with logical files as further discussed on the subsequent visual.

A specific instance of a tuple type is referred to as tuple.

Tuple Types

A specific instance of a given tuple type

Tuple

Tuple Type

A construct:

Consisting of a set of attributes

Forming the basis for the computerized processing of the objects belonging to the tuple type

Representing a class of objects with the same meaning, structure, and characteristics



Student Notebook

In the literature, tuple types are frequently referred to as relations. We have chosen the term tuple types to avoid confusion with relationships and relationship types and to emphasize that they are classes of tuples.




Uempty

Figure 6-4. Characteristics of Tuple Types CF182.0

Notes:

Tuple types can be viewed as logical data sets. They are the logical containers for the structured information represented by the tuples. Accordingly, the tuples can be viewed as logical records. They are the computational units being processed. The attributes of a tuple type determine the structure and contents of the tuples.

As logical records of logical data sets, all tuples of a tuple type have the same meaning, structure, and characteristics. This means that they are composed of the same type of information (attributes) and that the same constraints apply to them.

The attributes for a tuple type can be elementary or composite attributes. For a tuple, an attribute can assume zero, one, or multiple values. The cardinality for the attribute determines how many values the attribute must assume at least and at most for each tuple.

Each tuple type must have a set of attributes whose values uniquely identify all potential tuples of the tuple type. This set of attributes is referred to as primary key of the tuple type.

For reference purposes, each tuple type receives a unique name. This should be the unique class name expressing the function of the tuples.

Characteristics of Tuple Types

Tuple types can be viewed as logical data sets

Attributes determine contents of logical records

Tuples form computational units and can be viewed as logical records

Entity Types

Relationship Types

Tuple Types

Tables

All tuples of a tuple type have the same meaning, structure, and characteristics

Tuple type must have a set of attributes uniquely identifying its potential tuples

Primary key

Each tuple type receives a unique name

Tuple types established for entity types and most relationship types

Attributes can be elementary or composite

Cardinality determines number of values

Attributes can assume zero, one, or multiple values for a tuple



Student Notebook

In the design process, tuple types are an intermediate result in the process of transforming the entity types and relationship types of the entity-relationship model into tables of the target relational database management system. Tuple types are established for all entity types and most relationship types. As for entity types, the attributes of tuple types are affiliated with data elements and data groups of the data inventory. Basically, when establishing a tuple type, the data elements and data groups corresponding to the attributes or defining attributes of the associated entity type or relationship type are compiled.




Uempty

Figure 6-5. Tuple Types for Entity Types CF182.0

Notes:

For every entity type of the entity-relationship model for the application domain, one tuple type is established.

As name of the tuple type, we will use the name for the entity type. If you wish, you can use a different name, but there is no need for that.

The tuple type consists of all attributes for the entity type. Thus, when forming the tuple type, the data elements and data groups of the data inventory corresponding to the attributes of the entity type are compiled and cardinalities assigned to them.

The primary key for the tuple type consists of the attributes for the entity key of the entity type. Since entity keys satisfy the minimum principle (all attributes are necessary for the unique identification of the entity instances), the primary key also follows the minimum principle: All attributes are necessary for the unique identification of the individual tuples.

As you can imagine, the constraints for the entity type must be translated into equivalent constraints for the tuple type. However, at this point in time, we will not worry about the constraints since tuple types are only an intermediate result.

Tuple Types for Entity Types

ONE tuple type for EVERY entity type of entity-relationship model for application domain

Equivalent constraints as for entity type

Tuple type consists of all attributes for entity type

Attributes of entity key become attributes of primary key for tuple type


Name for tuple type = Name for entity type

AIRCRAFT MODEL

DimensionsLengthHeightWing Span

. . .



Student Notebook

Note that we have already prepared the establishment of tuple types for entity types. In the data inventory, we have recorded to which entity types the various data elements and data groups belong. Thus, to obtain the tuple type for an entity type, you just need to compile the data elements and data groups for the entity type.




Uempty

Figure 6-6. Tuple Types for Relationship Types CF182.0

Notes:

As for tuple types for entity types, we will choose the full name of the relationship type as name of the corresponding tuple type.

Since they describe the relationship type, the tuple type must consist of all defining attributes for the relationship type. Thus, to form the tuple type, the data elements and data groups of the data inventory for the defining attributes of the relationship type are compiled. Since a tuple expresses a single relationship, all attributes must assume one and only one value for each tuple. Accordingly, minimum cardinality and maximum cardinality must be 1 for all attributes of the tuple type.

As you would expect, the attributes of the relationship key become the attributes of the primary key for the tuple type. Since the relationship key had to follow the minimum principle, the primary key for the tuple type follows the minimum principle as well: All attributes are required to uniquely identify the individual tuples of the tuple type.

The example on the visual shows relationship type AIRCRAFT MODEL_for_AIRCRAFT, a 1:m relationship type. As you know, the defining attributes for this relationship type are the entity keys of its source and target: attributes Type Code and Model Number from

Tuple Types for Relationship Types

m

1. .1

_for_

AIRCRAFT MODEL

. . .


AIRCRAFT

. . .Aircraft NumberK

Equivalent constraints as for relationship type

Tuple type consists of defining attributes for

relationship type

Name for tuple type = Full name for relationship type

Defining AttributesType Code

Model Number

Aircraft Number

Attributes of relationship key become attributes of

primary key

Relationship KeyAircraft Number

Usually, ONE tuple type for each relationship type

But NONE for ...


© Copyright IBM Corp. 2000, 2 002 Unit 6. Tuple Types 6-11

Student Notebook

AIRCRAFT MODEL and attribute Aircraft Number from AIRCRAFT. They become the attributes of tuple type AIRCRAFT MODEL_for_AIRCRAFT.

Since the relationship type is a 1:m relationship type, Aircraft Number, the entity key of AIRCRAFT, becomes the relationship key. Therefore, it also becomes the primary key for tuple type AIRCRAFT MODEL_for_AIRCRAFT.

Any constraints for the relationship type must be translated into equivalent constraints for the tuple type. Again, we will not worry about them right now.

Up to now, we have only described how to establish the tuple type for a relationship type. We have not yet answered the question if there is a tuple type for every relationship type of the entity-relationship model? Usually, there is one tuple type for a relationship type. However , there are some exceptions . For some relationship types, there must not be a tuple type. These cases are described by the subsequent visuals.




Uempty

Figure 6-7. No Tuple Type for Relationship Type (1 of 3) CF182.0

Notes:

An owning relationship type connects an entity type or relationship type, the parent, to a dependent entity type. (The rectangle with rounded corners indicates that the represented object may be an entity type or a relationship type.) The key of the parent is part of the key of the dependent entity type and only instances with matching values are interconnected.

As we have seen before, the defining attributes for an owning relationship type consist of the key of the dependent entity type. Accordingly, the tuple type for the owning relationship type would just consist of the key of the dependent entity type.

The tuple type for the dependent entity type also contains the key for the dependent entity type. Since only instances with matching values are interconnected, the tuple type for the dependent entity type expresses all interconnections, and only those, established via the owning relationship type. Consequently, a tuple type for the owning relationship type would be redundant. Therefore, a tuple type for the owning relationship type need not and must not be provided.

No Tuple Type for Relationship Type (1 of 3)

Tuple type for dependent entity type already expresses owning relationship type

Tuple type for owning relationship type would be redundant

Defining AttributesKey of dependent

entity type

NO tuple type for owning relationship type


Parent

D

May be entity type or relationship type

Key of dependent entity type includes key of parent

Only instances with matching values interconnected



Student Notebook


Notes:

There is a second case when a tuple type must not be provided for a relationship type.

Assume that you have an m:m relationship type r1 which is the source of another relationship type r2 with a minimum target cardinality of 1 (cardinality 1..). Because r1 is an m:m relationship type, its relationship key consists of all its defining attributes. Consequently, the defining attributes of r2 include the defining attributes of r1. Accordingly, the tuple type for r2 includes all attributes of the tuple type for r1.

The target cardinality of 1.. of relationship type r2 implies that each instance of r1 is connected to at least one target instance of r2. In turn, this entails that r2 contains an instance for every instance of r1.

This means that the tuple type for r2 completely describes the instances for r1 and r2. As a consequence, a tuple type for r1 would be redundant and, therefore, need not and must not be provided.


Tuple type for r2 includes defining attributes for r1

1. .

. .m

. .m

r1r2

Entity type or relationship type



Tuple type for r2 expresses all relationship instances for r1

Tuple type for r1 would be redundant

Relationship Key of r1Defining attributes of r1

Defining Attributes of r2Defining attributes of r1

. . .

For every instance of r1, there is a corresponding instance for r2

NO tuple type for r1




Uempty
Note that it is imperative that the minimum target cardinality of r2 be 1. Otherwise, there would not necessarily be an instance of r2 for every instance of r1. Thus, the instances of r2 would not describe all instances of r1 and an own tuple type would be required for r1.
Note that it is also necessary that r1 is an m:m relationship type. Otherwise, the key of r1 would not consist of all defining attributes. Thus, the tuple type for r2 would not include all defining attributes for r1 and, therefore, not completely describe the instances for r1.

Of course, a tuple type would also not be required if r1 were the target of a relationship type r2 with a minimum source cardinality of 1. This case can be reduced to the case discussed above by redefining the primary direction of r2.



Student Notebook


Notes:

This visual combines the cases discussed on the previous two visuals. Thus, it is a corollary of them.

If r1 is an m:m relationship type and r2 an owning relationship type with a minimum cardinality of 1 for the dependent entity type, tuple types must not be provided for them.

A tuple type must not be provided for r1 because the tuple type for r2 would fully describe all instances for r1. A tuple type for r2 must not be provided either because the tuple type for the dependent entity type completely describes the appropriate instances.

In particular, the situation on the visual exists for mandatory nondefining attributes for m:m relationship types.


1. .

. .m . .m

r2

r1Entity type or relationship type



D

NO tuple type for r2 NO tuple type for r1




Uempty

Figure 6-10. Required Tuple Types for CAB CF182.0

Notes:

The above visual illustrates for which entity types and relationship types of our sample airline company called Come Aboard tuple types are required:

• Tuple types are required for all entity types of the entity-relationship model for Come Aboard. Therefore, the entity types are shown in reverse video.

• Because they are owning relationship types, tuple types must not be provided for relationship types:

AIRCRAFT TYPE_for_AIRCRAFT MODEL AIRPORT_nonstop_to_AIRPORT_in_ITINERARY_as_LEG FLIGHT_for_LEG PILOT_assigned_to_FLIGHT_by_PILOT ASSIGNMENT EMPLOYEE_is_MECHANIC EMPLOYEE_is_PILOT

The arrows for these relationship types have been shaded.

Required Tuple Types for CAB

_assigned_to_

m

m

m

1. .m

_in_

mD

_for_

_by_DC

1. .1

1 1DC DC

_is_S1

D 1. .m_for_

_as_

1. .1

DC

m

m

_can_fly_

1. .1

m

_from_

m

m

_scheduled_for_

m m

_can_land_at_

m

m

_trained_for_

m

1. .1

_for_From To

m m

_nonstop_to_

_belongs_to_

Owner

m 1

C

_for_

m

1 m

1. .1

_for_

PILOT ASSIGNMENT

EMPLOYEE

MAINTENANCE RECORD

PILOTMECHANIC

ITINERARY FLIGHT

AIRPORT

AIRCRAFT

AIRCRAFT MODEL

AIRCRAFT TYPE

LEG



Student Notebook

Note that the is-bundle for supertype EMPLOYEE represents a set of relationship types (two). All of them are owning relationship types.

• Tuple types must not be provided for relationship types:

AIRPORT_nonstop_to_AIRPORT_in_ITINERARY PILOT_assigned_to_FLIGHT

They are m:m relationship types being the source of other relationship types whose minimum target cardinality is 1. Both are cases of m:m relationship types with mandatory nondefining attributes.

The arrows for these relationship types have been shaded.

• Tuple types are required for all remaining relationship types. Their connecting arrows have been highlighted.




Uempty

Figure 6-11. Documentation of Tuple Types CF182.0

Notes:

As described before, tuple types consist of attributes. A tuple type for an entity type consists of the attributes of the entity type. A tuple type for a relationship type consists of the defining attributes for the relationship type. To describe a tuple type, you need to list its attributes, thereby, reflecting that the attributes may be composite.

Each line on the visual following the name for the tuple type represents an attribute. The components of a composite attribute immediately follow the line for the composite attribute itself and are indented. If a component is again a composite attribute, its components are indented even further.

In the example, Manufacturer is a composite attribute having composite attribute Address as a component.

To highlight the name of the tuple type, it is in boldface and has been underlined. The names of composite attributes have been bold-faced as well.

Attributes belonging to the primary key are marked by the letters PK separated from the name of the attribute by a comma. If a composite attribute belongs to the primary key (i.e.,

Documentation of Tuple Types

Cardinalities are relative!!

AIRCRAFT TYPE

Category

Number of Engines

Manufacturer

Manufacturer Code

Company Name

Phone Number

Address

Street [0..1]

City

Country

Postal Code [0..1]

State [0..1]

Post Office Box [0..1]

Type Code, PK

Components of composite attributes are indented

Name of tuple type

Belongs to primary key

Composite attributes

[ minimum .. maximum ]

Format:

for maximum if no upper limit*

Cardinality of attribute:

Minimum and maximum number of values for each instance

[1..1] assumed if omitted



Student Notebook

all its components belong to the primary key), the name of the composite attribute is marked with the letters PK rather than the individual components.

As discussed before, attributes have cardinalities specifying the minimum number and maximum number of values that an instance of the attribute must/can have. As part of the documentation of a tuple type, we want to show the cardinalities for its attributes. They are needed during normalization and in later steps of the design process.

The cardinality for an attribute follows its name or, if applicable, the letters PK and is specified as follows: Minimum cardinality and maximum cardinality are separated by two periods and enclosed in brackets:

[minimum .. maximum]

If there is no upper limit for the number of values the attribute can assume, an asterisk (*) is used as maximum cardinality. Enclosing brackets are used in analogy to the dimension specification for arrays in programming languages.

If the cardinality for an attribute is omitted, [1..1] is assumed.

Note that the specified cardinalities are relative : If the attribute is a direct component of the tuple type, the cardinality expresses how many values the attributes must/can assume for each tuple of the tuple type. If the attribute is a component of a composite attribute, the cardinality rather specifies how many values the attribute must/can assume within each instance of the composite attribute.

As a consequence, it is possible that, despite of a minimum cardinality of 1, an attribute does not assume a value for a specific tuple! This happens if the owning composite attribute has a minimum cardinality of 0 and does not assume a value for a tuple.

The cardinality for an attribute can be derived from the cardinality specifications for the data element or data group the attribute is based upon. Thus, it would be possible to omit the cardinalities in the documentation of a tuple type and to go back to the data inventory when the cardinalities are needed. However, it is quite handy to have the cardinalities in the tuple type documentation.




Uempty

Figure 6-12. Tuple Types With Roles CF182.0

Notes:

Generally, an attribute of a tuple type receives the same name as the data element or data group it is based upon. However, you might want to give it a different name. In some cases, you even have to. If a data element or data group is used by multiple attributes at the same level in different roles, you need to give the attributes different names. Same level in this context means as direct components of the tuple type or of a composite attribute.

For example, in tuple type FLIGHT, data element Airport Code is used twice as direct attribute of the tuple type. Once it is used as airport code for the airport of departure, once as airport code for the airport of arrival. Without naming them differently, the two roles could not be differentiated. In the data inventory for Come Aboard, the two roles for the data element have been identified with different role names (From and To). Therefore, the names of the attributes should be the role names.

However, you still want to keep the link to the appropriate data element or data group in the data inventory. You can achieve this by specifying the data element or data group name and the attribute name by means of an AS clause as done on the visual.

Attribute/Role Name

Attribute/Role Name

Name of Data Group

Name of Data Element

Tuple Types With Roles

FLIGHT

Airport Code AS From, PK

Departure AS Actual Departure [0..1]

Flight Locator, PK

Departure AS Planned Departure

Departure Date

Departure Time

Arrival Date

Arrival Time

Arrival AS Planned Arrival

Flight Number, PK

Airport Code AS To, PK

Departure Date

Departure Time

Arrival Date

Arrival Time

Arrival AS Actual Arrival [0..1] Departure Time OF Actual Departure

Qualified Name



Student Notebook

In addition to data element Airport Code, tuple type FLIGHT uses data groups Departure and Arrival in different roles. The different usages of data group Departure have been highlighted. Departure is used as planned departure (role/attribute name Planned Departure) and as actual departure (role/attribute name Actual Departure).

Data group Departure contains data elements Departure Date and Departure Time. This raises the question if the attributes for the different usages need not be named differently? They need not because they are components of differently named composite attributes and are unique in the scope of the composite attributes.

Formally, the full name of a component is qualified by the name of the composite attribute. For example, the full name of attribute Departure Time of composite attribute Actual Departure is Departure Time OF Actual Departure.




Uempty

Figure 6-13. Some Sample Tuple Types for CAB CF182.0

Notes:

The above visual illustrates some further tuple types for our sample airline company called Come Aboard. The tuple types in the upper box are for entity types. Tuple type AIRCRAFT MODEL has only mandatory attributes, i.e., all attributes have a minimum cardinality of 1. They also all have a maximum cardinality of 1.

Tuple type ITINERARY has only a few attributes. You are probably missing the (starting) weekdays on which the itinerary is operated as described in the problem statement for Come Aboard. A closer examination reveals that the weekdays on which itineraries are operated are not inherent characteristics of itineraries. Rather, they are characteristics of the legs for itineraries. The (starting) weekdays for an itinerary can be derived from the (starting) weekdays for its legs.

Only a few attributes of tuple type MAINTENANCE RECORD are shown. Note that the tuple type contains an attribute Aircraft Number expressing to which aircraft the maintenance record belongs. In Unit 4 - Entity-Relationship Model, we determined that the interrelationship between maintenance records and aircraft could not be expressed by

Some Sample Tuple Types for CAB

Aircraft Number, PKEmployee Number, PK

MECHANIC_scheduled_for_AIRCRAFT

Employee NumberMaintenance Number, PK

MAINTENANCE RECORD_from_MECHANIC

Flight Number, PK

Airport Code AS To, PKFlight Locator, PK

Aircraft Number


AIRCRAFT_for_FLIGHT

Established On

Effective Until [0..1]

Flight Number, PK

Effective From [0..1]

ITINERARY

MAINTENANCE RECORD

Date of Maintenance

Aircraft Number

Maintenance Number, PK

Type of Maintenance

. . .

AIRCRAFT MODEL

Model Number, PK

LengthHeight

Type Code, PK

Dimensions

Wing Span

Net WeightMaximum Weight

Weights

Cruising Speed



Student Notebook

means of a relationship type. It had to be expressed by an attribute. This is reflected in the tuple type.

The lower box on the visual contains tuple types for relationship types for Come Aboard. As a principle, all attributes of tuple types for relationship types have a cardinality of [1..1].

AIRCRAFT_for_FLIGHT and MAINTENANCE RECORD_from_MECHANIC are tuple types for 1:m relationship types. Accordingly, their relationship keys only consist of some of the defining attributes. This is reflected by only some of the attributes of the tuple types belonging to the primary keys.

Tuple type MECHANIC_scheduled_for_AIRCRAFT is for an m:m relationship type. Therefore, all its attributes belong to the primary key.




Uempty

Figure 6-14. A Special Consideration CF182.0

Notes:

It is possible that the tuple type for an entity type just consists of the primary key. However, it is pretty unusual. Therefore, you should discuss with the application domain expert if the appropriate entity type is really necessary. When establishing the problem statement for the application domain, the application domain expert might have thought that there would be information of that type. However, the data inventory may not contain any data elements and data groups for the entity type.

If the application domain expert agrees, remove the entity type from the entity-relationship model, adjust the relationship types using the entity type as source or target accordingly, and correct the tuple types.

However, you really should examine the case carefully. The pure existence of an entity type or the use of it by relationship types as source or target constitutes already information that you may lose by removing the entity type. An entity type represents a class of objects with the same meaning and characteristics. Being an instance of that class identifies the appropriate object as a member of the class even if there are no further characteristics to be stored for the object.

A Special Consideration

Resulting tuple type for entity type may just consist of primary key

But be careful!!!

As such, the pure existence of an entity type or the existence of a relationship type using it as source or

target are information that you may lose

Check with application domain expert if entity type is really necessary

Remove entity type from entity-relationship model if not required by application domain

Change relationship types accordingly

Change tuple types accordingly



Student Notebook




Uempty
6.2 Normalization of Tuple Types


Student Notebook

Figure 6-15. Normalization - An Introduction CF182.0

Notes:

The tuple types established so far may have the following problems:

• They may have attributes with a maximum cardinality other than 1, i.e., have repeating groups. A tuple type with repeating groups cannot immediately be converted into a table. This is because the columns of tables can only accept a single value.

• Even within tuple types, redundant information may be stored. This may lead to inconsistencies between the tuples of a tuple type if update operations do not change all affected tuples.

• The tuple types may contain insert, update, and delete anomalies. These anomalies may prevent the storage of information, cause inconsistent tuples, or result in the loss of information.

Normalization remedies these deficiencies within, but not across tuple types. It improves the condition of the tuple types by raising their quality level step-by-step.

There are five quality levels defined for tuple types by means of Normal Forms. These Normal Forms are referred to as First Normal Form, Second Normal Form, and so on.

Normalization - An Introduction

NormalizationImproves condition of tuple types by raising their quality level

Normal Forms define quality levels of tuple types Five Normal Forms: 1st Normal Form through 5th Normal FormSubsequent Normal Form based on previous Normal FormThe higher the Normal Form the better the quality of the tuple typeOnly first three Normal Forms of practical relevance

Established Tuple Types . . .Generally, cannot be converted one-to-one into tables

Attributes can assume multiple values whereas columns cannot

May contain redundant information May lead to inconsistent tuples

May contain insert, update, and delete anomaliesInformation cannot be stored because of missing unrelated informationInformation may become inconsistent due to updatesInformation may be lost when a tuple is deleted




Uempty
Each subsequent Normal Form requires that the previous Normal Form is satisfied together with some additional conditions. Thus, the higher the Normal Form for a tuple type, the better and more stable it is and the fewer of the above-mentioned problems may occur.
Only the first three Normal Forms are of practical relevance. Nearly nobody ensures that his/her tuple types satisfy the Fourth Normal Form or even the Fifth Normal Form. Both Normal Forms deal with n-ary many-to-many relationship types and are more of a theoretical nature. They are very complex and violations are extremely hard to detect.

Normally, when establishing tuple types based on an entity-relationship model with only binary relationship types, you should not have violations of the Fourth Normal Form or the Fifth Normal Form. This assumes that you have dutifully identified your relationship types and not hidden and combined them in artificial entity types.

Because of the limited practical value of the remaining Normal Forms, we will concentrate on the first three Normal Forms. However, to illustrate how difficult it is to verify the higher Normal Forms, we will address the Fourth Normal Form as well, but skip the Fifth Normal Form.



Student Notebook

Figure 6-16. First Normal Form - Definition CF182.0

Notes:

The First Normal Form deals with repeating groups. This means, it deals with attributes having a maximum cardinality higher than 1 (considering *, meaning unlimited, also as higher than 1). Repeating groups represent a problem when mapping tuple types into tables because the columns of tables only allow a single value. Therefore, the First Normal Form requires that all attributes, elementary or composite, have at most one value. It is allowed that an attribute may not have a value for some tuples.

Tuple type ITINERARY for our sample airline company called Come Aboard is in First Normal Form. None of its attributes has a maximum cardinality higher than 1.

Tuple type AIRCRAFT violates the First Normal Form because it contains a repeating group. Composite attribute Seat has a maximum cardinality of *. This means, an aircraft can have many seats and an upper limit has not been established.

Since Seat is a composite attribute, its values are composed of values for its components. Each value of Seat consists of a value for Seat Number, Seat Location, Seat Class, and Section. Effectively, this means that these attributes assume multiple values as well, namely, as many as the composite attribute.

First Normal Form - Definition

Not in 1st Normal Form

Repeating Group

AIRCRAFT

Date Manufactured

Seat [0..*]

Seat Number

Seat Location

Seat Class

Section

. . . Date in Service [0..1]

Aircraft Number, PK

A tuple type is in the First Normal Form if all its attributes, elementary or composite, can have at most one value

ITINERARY

Established On

Effective From [0..1]

Effective Until [0..1]

Flight Number, PK

In 1st Normal Form




Uempty

Figure 6-17. First Normal Form - Solution CF182.0

Notes:

You can solve the violation of the First Normal Form as follows:

• Remove attribute Seat from tuple type AIRCRAFT and create a new tuple type SEAT. The new tuple type contains one tuple for each seat on every aircraft. Accordingly, the cardinality of composite attribute Seat in the new tuple type is [1..1].

• To not lose the interconnection to aircraft, the new tuple type must contain, for each seat, the serial number of the aircraft to which the seat belongs (attribute Aircraft Number).

• None of the attributes alone can form the primary key for the new tuple type since none uniquely identifies the tuples of the tuple type. Seat numbers are not unique across aircraft. Different aircraft may have the same seat numbers. However, seat numbers are unique per aircraft. Therefore, the primary key must consist of two attributes:

Aircraft Number and Seat Number

Sometimes, it is necessary to introduce an additional attribute (e.g., a sequence number) to attain the unique identification of the tuples. Sometimes, it is desirable to

First Normal Form - Solution

May need to separate out multiple attributes together

With same maximum cardinalityDepends on composite attributes

AIRCRAFT

Date Manufactured

. . . Date in Service [0..1]

Aircraft Number, PKSEAT

Seat Number, PK

Seat

Seat Location

Seat Class

Section

Aircraft Number, PK

Cardinality [1..1]

Resulting tuple types must again be inspected for violations of First Normal Form



Student Notebook

introduce an additional attribute which, together with other attributes, uniquely identifies the tuples. However, remember that the primary key is used to reference the individual tuples of a tuple type. Therefore, it should be as natural as possible. A time which, together with other attributes, could be used to uniquely identify the tuples is not a good component for a primary key. Who remembers the various times for the tuples?!

When creating the new tuple type, all logically related attributes with the same maximum cardinality should be moved to the same tuple type. If the data groups for the composite attributes of a tuple type were established properly, all logically related attributes should be part of the same composite attribute. In case of our example, they all belong to composite attribute Seat. The composite attribute is then the only one (in addition to the primary key of the original tuple type) to be moved to the new tuple type.

If data groups have not been established at all or improperly, you must determine during normalization which attributes logically belong together and should be moved together. In other words, the data groups must be established in any case. Why not establishing them correctly from the start, i.e., when the data inventory is established?!

Repeating groups may be nested and should be resolved from outside in. Thus, a tuple type resulting from normalization must be inspected again for violations of the First Normal Form.




Uempty

Figure 6-18. First Normal Form - Instance Example CF182.0

Notes:

This visual uses an instance example for the tuple types considered on the previous visuals. The tuple types are represented as tables to illustrate some sample tuples for them.

The top portion of the visual illustrates tuple type AIRCRAFT before normalization. For both tuples shown, composite attribute Seat, and, thus, its components Seat Number, Seat Location, Seat Class, and Section, assume many values. The component values in a line belong together. They form the components of the appropriate value for the composite attribute. (As you may correctly conclude from the visual, the components of a composite attribute become separate columns in the tables of the relational database management system.)

The bottom half of the visual illustrates the situation after normalization. Tuple type AIRCRAFT no longer contains any seat information. The seat information is contained in tuple type SEAT. For each seat on an aircraft, SEAT contains one tuple. Aircraft Number identifies to which aircraft the seat belongs.

First Normal Form - Instance Example

Aircraft Number

Date Manufactured

Seat Number

Seat Location

Seat Class Section Date in Service

B474001323 1994-10-12 1A1B1C

. . .46J

WINDOWMIDDLEAISLE

. . .WINDOW

FIRSTFIRSTFIRST

. . .ECONOMY

N/SMOKINGN/SMOKINGN/SMOKING

. . .SMOKING

1997-01-01

B171004217 1999-10-23 1A1B

. . .28G

WINDOWAISLE

. . .WINDOW

BUSINESSBUSINESS

. . .ECONOMY

N/SMOKINGN/SMOKING

. . .N/SMOKING

1999-11-15

Seat

AIRCRAFT

BEFORE

Aircraft Number

Date Manufactured

Date in Service

B474001323 1994-10-12 1997-01-01

B171004217 1999-10-23 1999-11-15

AIRCRAFT

Aircraft Number

Seat Number

Seat Location

Seat Class Section

B474001323 1A WINDOW FIRST N/SMOKING

B474001323 1B MIDDLE FIRST N/SMOKING

B474001323 1C AISLE FIRST N/SMOKING

. . . . . . . . . . . . . . .B474001323 46J WINDOW ECONOMY SMOKING

B171004217 1A WINDOW BUSINESS N/SMOKING

B171004217 1B AISLE BUSINESS N/SMOKING

. . . . . . . . . . . . . . .B171004217 28G WINDOW ECONOMY N/SMOKING

Seat

SEAT

AFTER



Student Notebook

Figure 6-19. First Normal Form - ER Model Correction CF182.0

Notes:

The fact that a new tuple type has been created to achieve First Normal Form should be reflected in the entity-relationship model. In case of our example, this means that a dependent entity type SEAT for entity type AIRCRAFT must be introduced together with the associated owning relationship type AIRCRAFT_has_SEAT. The entity type is indeed a dependent entity type:

• The key of entity type AIRCRAFT is part of the entity key for SEAT.

• Instances with matching key/key portion values, and only those, are interconnected.

The target cardinality for the owning relationship type is m (0..m) because the cardinality for composite attribute Seat was [0..*] in the original tuple type. This means that there are aircraft without seats (cargo planes). If necessary, go back to the application domain expert to verify the cardinality.

The problem statement for the application domain should be updated as well (by the application domain expert).

First Normal Form - ER Model Correction

SEAT

_has_

m DC

_by_DC

1. .1

DC DC

_is_

1 1

S1

D 1. .m_for_

_as_

1. .1

DC

mD

_for_

_assigned_to_

m

m

m

1. .m

_in_

m

m

_can_fly_

1. .1

m

_from_

m

m

_scheduled_for_

m m

_can_land_at_

m

m

_trained_for_

m

1. .1

_for_From To

m m

_nonstop_to_

_belongs_to_

Owner

m 1

C

m

1. .1

_for_

m

1

_for_

AIRCRAFT

EMPLOYEE

PILOT ASSIGNMENT

MAINTENANCE RECORD

PILOTMECHANIC

ITINERARY FLIGHT

AIRPORT

AIRCRAFT MODEL

AIRCRAFT TYPE

LEG




Uempty

Figure 6-20. First Normal Form - 2nd Example (1 of 2) CF182.0

Notes:

For the Seat example considered so far, you would have had another, but not attractive, solution: You could have introduced an own tuple in tuple type AIRCRAFT for each value of composite attribute Seat by repeating the corresponding values for the other attributes. However, in this way, you would have created a lot of redundancy endangering the consistency of the tuples through update operations not changing all related tuples. Thus, not really a solution to be considered.

This visual discusses another possible solution for repeating groups with a low fixed maximum cardinality. Look at the example on the visual. Tuple type AIRCRAFT has another repeating group, namely, the engines belonging to the aircraft. In this repeating group, Manufacturer is again a composite attribute. Its components have not been listed since not relevant for the present discussion.

In contrast to the previous example, composite attribute Engine has a low fixed maximum cardinality. Its maximum cardinality is four.

First Normal Form - 2nd Example (1 of 2)

AIRCRAFT

. . .

Date ManufacturedDate in Service [0..1]

Aircraft Number, PK

Engine 1 [0..1]Engine NumberEngine TypeManufacturerEngine Position




Repeating Group

AIRCRAFT

. . .


Aircraft Number, PK

Engine [0..4]Engine NumberEngine TypeManufacturerEngine Position

Are you really sure that this is the solution???

Can you control that there will never be more than four engines?

What about engines not mounted on aircraft?

Go back to the application domain expert and ...



Student Notebook

To abolish the repeating group, you could replace Engine by four composite attributes Engine 1, Engine 2, Engine 3, and Engine 4. All of these would have the same components as Engine, but a cardinality of [0..1].

Formally, the violation of the First Normal Form has vanished. However, you should ask yourself if that is really the solution that you want because it has serious limitations and drawbacks:

• Are you really sure that the maximum cardinality will not increase over time? Is it really under your control that the maximum cardinality will not increase or can somebody else just change the rules on you? If the maximum cardinality increases, you need additional attributes reflecting the cardinality increase. This will cause changes in your queries and, especially, your programs because they will handle the various engines individually.

In contrast, if you have a new tuple type with one tuple for each engine of an aircraft, you can use loop processing. If the proper end-of-data conditions are tested, processing can be independent of the number of engines mounted and the maximum number of engines for an aircraft.

• Another question to consider for this solution (as well as for the original tuple type) is: What happens with engines not mounted on an aircraft? Do you not keep the referenced information for them as well? As the entity-relationship model and the tuple type for Come Aboard stand right now, you would not know where to keep information about engines not mounted.

The case on the visual reveals a problem with the conceptual view of your database design, especially, with the entity-relationship model. You should go back to the application domain expert and ask him/her if the engine information must be kept for engines not mounted? If so, you should solve the violation of the First Normal Form by first correcting your entity-relationship model and then changing your tuple types accordingly. This is illustrated on the next visual.




Uempty

Figure 6-21. First Normal Form - 2nd Example (2 of 2) CF182.0

Notes:

In case of our example, the application domain expert has confirmed that information about engines is also required for engines not mounted on aircraft. Consequently, the engines represent an independent conceptual unit, a class of objects with the same meaning and characteristics. Therefore, they must be represented by an entity type in the entity-relationship model. Accordingly, the entity-relationship model for Come Aboard is incomplete. It should be corrected before the tuple types are corrected:

• An entity type ENGINE is introduced containing elementary attributes Engine Number and Engine Type and composite attribute Manufacturer.

Since the serial numbers for engines are unique across engine manufacturers, Engine Number becomes the entity key for ENGINE.

• In addition to the entity type, a relationship type ENGINE_on_AIRCRAFT must be introduced specifying which engines are mounted on the individual aircraft.

• You may wonder why attribute Engine Position has not been added to entity type ENGINE. Engine Position specifies in which position the appropriate engine is mounted

First Normal Form - 2nd Example (2 of 2)

... get your ER model in order!!!

AIRCRAFTAircraft NumberK. . .

ENGINE LOCATIONEngine NumberKEngine Position

ENGINEEngine NumberK. . .

_on_

m1 _in_

1. .1DC

AIRCRAFT

. . .


Aircraft Number, PK

ENGINEEngine Number, PKEngine TypeManufacturer

Engine Number, PKAircraft NumberEngine Position

ENGINE_on_AIRCRAFTENGINE LOCATION

Engine Number, PKEngine Position

Engine Number, PKAircraft Number

ENGINE_on_AIRCRAFT



Student Notebook

on an aircraft. The engine position is not a characteristic of the engine as such, but rather a characteristic of the relationship linking the engine to an aircraft. Accordingly, Engine Number is a nondefining attribute of relationship type ENGINE_on_AIRCRAFT.

As described in Unit 4 - Entity-Relationship Model, dependent entity types are used to model the nondefining attributes of relationship types. Therefore, dependent entity type ENGINE LOCATION is introduced containing attribute Engine Position. Its parent is relationship type ENGINE_on_AIRCRAFT and its owning relationship type is ENGINE_on_AIRCRAFT_in_ENGINE POSITION.

The target cardinality of the owning relationship type is 1..1 because each mounted engine must be in one and only one position of the aircraft. The entity key of ENGINE LOCATION is Engine Number, the relationship key of ENGINE_on_AIRCRAFT. The cascading property for the target of the owning relationship type expresses the fact that the engine position is to be deleted when the engine is taken off the aircraft.

After we have corrected the entity-relationship model, we can establish the corresponding tuple types:

• We need tuple types for the three entity types, i.e., for AIRCRAFT, ENGINE, and ENGINE LOCATION. The tuple type for AIRCRAFT no longer contains engine information. The tuple type for ENGINE contains only the really engine-specific information. Tuple type ENGINE LOCATION describes, for mounted engines, on which engine position they are mounted. It does not specify on which aircraft the engine is mounted.

• We need a tuple type for relationship type ENGINE_on_AIRCRAFT. The tuple type contains the engine number of the mounted engine and the aircraft number of the aircraft on which the engine is mounted.

• Since ENGINE_on_AIRCRAFT_in_ENGINE POSITION is an owning relationship type, we must not have a tuple type for it.

Tuple types ENGINE LOCATION and ENGINE_on_AIRCRAFT have the same primary key. Since every tuple of ENGINE LOCATION has a corresponding tuple with the same primary key value in ENGINE_on_AIRCRAFT and vice versa, the two tuple types can be combined. The resulting tuple type is again called ENGINE_on_AIRCRAFT. We will not further discuss here when tuple types can be combined. We will leave this to the next unit.




Uempty

Figure 6-22. Second Normal Form - Definition CF182.0

Notes:

Basically, the Second Normal Form deals with the improper assignment of attributes to tuple types. It applies to tuple types whose primary keys consist of more than one elementary attribute.

A tuple type is in the Second Normal Form if:

• It is in First Normal Form.

• All its elementary nonkey attributes are functionally dependent on the entire primary key, i.e., on all attributes belonging to the primary key.

As mentioned before, if a composite attributes belongs to the primary key, all its components belong to the primary key. Thus, the functional dependence must be on all components of the composite attribute.

Similarly, all elementary components of a composite attribute must be functionally dependent on the entire primary key for the Second Normal Form to be satisfied.

The primary key of tuple type FLIGHT for our sample airline company consists of four attributes. All elementary nonkey attributes of the tuple type are functionally dependent on

Second Normal Form - Definition

Not in 2nd Normal Form

Only Dependent On

LEG



Leg Number

Mileage Credit

Flight Number, PK

. . .

A tuple type is in the Second Normal Form if:

It is in the First Normal Form

All its elementary nonkey attributes are functionally dependent on the entire primary key

In 2nd Normal Form

FLIGHT



Flight Locator, PKDeparture AS Planned Departure

Departure DateDeparture Time

Arrival DateArrival Time


Flight Number, PK




Arrival AS Actual Arrival [0..1]



Student Notebook

the entire primary key, i.e., on all four attributes. Therefore, tuple type FLIGHT is in Second Normal Form.

The primary key of tuple type LEG consists of the three attributes Flight Number, From and To. From and To identify the airport of departure and the airport of arrival for the leg of the considered flight. Leg Number depends on all attributes of the primary key. A different itinerary (flight number) may contain the same nonstop connection as a different leg.

In contrast, attribute Mileage Credit, i.e., the miles credited for the leg on frequent-flyer accounts, does not dependent on Flight Number. It only depends on the airport of departure and the airport of arrival, i.e., on From and To. Thus, the tuple type violates the Second Normal Form.




Uempty

Figure 6-23. Second Normal Form - Solution CF182.0

Notes:

As mentioned before, the Second Normal Form deals with attributes assigned to the wrong tuple type. Attribute Mileage Credit in our example should not have been assigned to tuple type Leg.

To determine the proper tuple type, you should consult the entity-relationship model for the application domain. There are two possibilities:

• The entity-relationship model contains the entity type to which the improperly assigned really belongs. In this case, add the attribute to the tuple type for the entity type.

• The entity-relationship model is incomplete since it does not contain the proper entity type for the attribute. In this case, correct the entity-relationship model and reestablish the tuple types concerned based on the corrected entity-relationship model.

In case of our example, the entity-relationship model is missing the proper entity type for attribute Mileage Credit. As a matter of fact, Mileage Credit is rather a nondefining attribute for nonstop connections, i.e., for relationship type AIRPORT_nonstop_to_AIRPORT. Thus,

Second Normal Form - Solution

_in_

1. .m

m

_in_

1. .1DC

_as_

1. .1

DC

ToFrom

m m

_nonstop_to_

NONSTOP CONNECTION

AIRPORT

ITINERARY

LEG

No tuple types for any of the relationship types

LEG

Airport Code AS From, PKAirport Code AS To, PKLeg Number

Flight Number, PK

. . .

NONSTOP CONNECTIONAirport Code AS From, PKAirport Code AS To, PKMileage Credit

Tuple types for AIRPORT and ITINERARY unchanged



Student Notebook

it is modeled as a dependent entity type for that relationship type as illustrated on the visual. The dependent entity type is called NONSTOP CONNECTION.

The cardinality of 1..1 for the target of the owning relationship type requires the mileage credit to be provided when the nonstop connection is established.

Having introduced dependent entity type NONSTOP CONNECTION, the relationship type specifying the nonstop connections for the various itineraries can now interconnect entity types NONSTOP CONNECTION and ITINERARY. It need no longer interconnect relationship type AIRPORT_nonstop_to_AIRPORT and entity type ITINERARY. The new relationship type is called NONSTOP CONNECTION_in_ITINERARY. As a consequence, dependent entity type LEG must now be based on this relationship type.

Of course, the problem statement for the application domain and the data inventory should be updated accordingly by the application domain expert and the data base designer.

After we have corrected the entity-relationship model, we can reestablish the tuple types for the entity types and relationship types concerned:

• The tuple types for entity types AIRPORT and ITINERARY remain unchanged.

• The new tuple type NONSTOP CONNECTION contains attribute Mileage Credit and the key of the dependent entity type, i.e., the attributes From and To.

• The tuple type for entity type LEG no longer contains attribute Mileage Credit.

• Tuple types must not be provided for any of the relationship types on the visual for the following reasons:

- Relationship types AIRPORT_nonstop_to_AIRPORT and NONSTOP CONNECTION_in_ITINERARY are m:m relationship types being the source of other relationship types with a minimum target cardinality of 1 (see page 6-14).

- Relationship types AIRPORT_nonstop_to_AIRPORT_in_NONSTOP CONNECTION and NONSTOP CONNECTION_in_ITINERARY_as_LEG are owning relationship types (see page 6-13).

During the establishment of the entity-relationship model for Come Aboard, we already resolved another violation of the Second Normal Form. The attributes of entity type AIRCRAFT TYPE originally belonged to entity type AIRCRAFT MODEL which represented a violation of the Second Normal Form.




Uempty

Figure 6-24. Third Normal Form - Definition CF182.0

Notes:

The Third Normal Form requires that a tuple type is in Second Normal Form and none of its elementary nonkey attributes is functionally dependent on other nonkey attributes.

If attribute-1 and attribute-2 are attributes of a tuple type, attribute-2 is functionally dependent on attribute-1 if, for each occurrence of a value of attribute-1, attribute-2 assumes the same value. For different values of attribute-1, attribute-2 may assume different values. However, for the same value of attribute-1, it must always assume the same value. Functional dependence may not just exist on a single elementary attribute; it can also exist on a composite attribute, meaning dependence on all components, or on a set of attributes.

For the Third Normal Form, functional independence is not only required for the direct elementary attributes of the tuple type, but for all components of composite attributes. This means, it is required for all elementary attributes of the tree structure for the tuple type. Furthermore, there must not be a functional dependence on components of composite attributes.

Third Normal Form - Definition

Not in 3rd Normal Form

Functionally Dependent On

AIRCRAFT TYPE


Company Name

Phone Number

AddressStreet [0..1]

City

CountryPostal Code [0..1]

State [0..1]

Number of Engines

CategoryManufacturer

Manufacturer Code

Type Code, PK

In 3rd Normal Form

AIRCRAFT MODEL

Model Number, PKDimensions

LengthHeightWing Span

Weights

Maximum WeightCruising Speed

Net Weight

Type Code, PK

A tuple type is in the Third Normal Form if:

It is in the Second Normal Form

None of its elementary nonkey attributes is functionally dependent on other nonkey attributes



Student Notebook

Tuple type AIRCRAFT MODEL on the visual is in Third Normal Form because none of its elementary nonkey attributes is dependent on other nonkey attributes. The dimensions, weights, and the cruising speed are all functionally independent of each other. For the same dimensions, different weights and cruising speeds may apply and vice versa.

In tuple type AIRCRAFT TYPE, elementary attributes Company Name, Phone Number, and all components of composite attribute Address are functionally dependent on attribute Manufacturer Code. Thus, tuple type AIRCRAFT TYPE is not in Third Normal Form.

Violations of the Third Normal Form can lead to inconsistent tuples as a consequence of update operations changing only some of the tuples with the same dependent values. They may also lead to the loss of the dependent information if the last tuple for a value is deleted.

If the data groups for the composite attributes of a tuple type were established properly, all related functional dependences should be within the same composite attribute. For our example, they are all in composite attribute Manufacturer. Thus, the usage of properly created composite attributes can ease your task of determining functional dependences. If you have not formed data groups/composite attribute or have not established them correctly, functional dependences may exist across composite attributes.




Uempty

Figure 6-25. Third Normal Form - Solution CF182.0

Notes:

To solve a violation of the Third Normal Form, you must move all attributes being functionally dependent on the same set of attributes to a new tuple type. The attributes the moved attributes were dependent on are repeated in the new tuple type. They become the primary key of the new tuple type.

In case of our example, the attributes Company Name, Phone Number and all components of Address are removed from tuple type AIRCRAFT TYPE. They become attributes of a new tuple type MANUFACTURER. Attribute Manufacturer Code remains in tuple type AIRCRAFT TYPE, but is repeated in MANUFACTURER. It becomes the primary key of MANUFACTURER. In this way, the association between aircraft types and manufacturers is maintained.

If the composite attributes for a tuple type have been formed correctly, the resolution of a Third Normal Form violation incorporates the following:

• A new tuple type is created for the entire composite attribute having the functional dependences.

Third Normal Form - Solution

AIRCRAFT TYPE

Category

Manufacturer Code

Type Code, PK

Number of Engines

MANUFACTURER

Address

Company Name

Street [0..1]


City

Manufacturer Code, PK

State [0..1]

Country

Postal Code [0..1]

Phone Number



Student Notebook

• The primary key of the new tuple type is repeated (remains) in the original tuple type.

For our sample tuple type, the composite attributes have been formed correctly. Accordingly, a new tuple type MANUFACTURER has been created for composite attribute Manufacturer and the primary key of that tuple type is repeated in the original tuple type.




Uempty

Figure 6-26. Third Normal Form - Instance Example CF182.0

Notes:

This visual gives an instance example for the tuple types of the previous visuals. However, because of the limited size of the visual, some attributes have been omitted: Category, Street, Post Office Box, Postal Code, and Phone Number are not shown. The tuple types have been presented in form of tables to show multiple instances for them.

The top portion of the visual illustrates tuple type AIRCRAFT TYPE before normalization. The information for a manufacturer is (must be) repeated for each aircraft type produced by him/her. As you can envisage, this leads to inconsistent information if only some of the tuples for a manufacturer are updated when the manufacturer information changes.

The bottom half of the visual illustrates the situation after normalization. Tuple type AIRCRAFT TYPE now only contains the manufacturer code and no longer the information functionally dependent on it. The information for a manufacturer is contained in tuple type MANUFACTURER. Tuple type MANUFACTURER contains one tuple for every manufacturer.

The new tuple type allows Come Aboard to store information about manufacturers without having aircraft types from them. This was not possible before normalization.

Type Code

Manufacturer Code

Company Name City State Country Number of Engines

B747 BOEING BOEING CORPORATION SEATTLE WA USA 4

A310 AIRBUS AIRBUS INDUSTRIES TOULOUSE FRANCE 2





Manufacturer

AIRCRAFT TYPE

BEFORE

Type Code

Manufacturer Code

Number of Engines

B747 BOEING 4

A310 AIRBUS 2

A340 AIRBUS 4

B737 BOEING 2

A319 AIRBUS 2

B777 BOEING 2

AIRCRAFT TYPE

Manufacturer Code

Company Name City State Country

BOEING BOEING CORPORATION SEATTLE WA USA

AIRBUS AIRBUS INDUSTRIES TOULOUSE FRANCE

MANUFACTURER

AFTER

Third Normal Form - Instance Example



Student Notebook

Figure 6-27. Third Normal Form - ER Model Correction CF182.0

Notes:

The fact that a new tuple type has been created to comply with the Third Normal Form should be reflected in the entity-relationship model. The creation of a new tuple type really means that the appropriate information has become an independent conceptual unit representing a class of objects with the same meaning and characteristics. Thus, it means that the entity-relationship model should contain a new entity type.

Since the new tuple type has an association with the old tuple type, the new entity type must have a relationship type with the entity type (or relationship type) for the old tuple type.

In case of our example, the entity-relationship model must be extended by a new entity type (MANUFACTURER) and a new relationship type between entity types AIRCRAFT TYPE and MANUFACTURER. The relationship type is called AIRCRAFT TYPE_from_MANUFACTURER. The relationship type is a 1:m relationship type: An aircraft type can be from one and only one manufacturer, but a manufacturer may manufacture multiple aircraft types. Accordingly, the key of the relationship type is Type Code, the key of entity type AIRCRAFT TYPE.

Third Normal Form - ER Model Correction

AIRCRAFT TYPEType Code, PKCategoryNumber of Engines

MANUFACTURERManufacturer Code, PKCompany NameAddress

Street [0..1]Post Office Box [0..1]CityState [0..1]CountryPostal Code [0..1]

Phone Number

Type Code, PKManufacturer Code

AIRCRAFT TYPE_from_MANUFACTURER

AIRCRAFT TYPEType Code, PKCategoryManufacturer CodeNumber of Engines

m 1. .1

_from_AIRCRAFT TYPE

MANU- FACTURER




Uempty
The source cardinality of m (0..m) indicates that Come Aboard wants to keep information about manufacturers even if it does not own one of their aircraft types. However, you must verify this with the application domain expert. The unnormalized tuple type for AIRCRAFT TYPE would not have allowed you to store information about manufacturers without an aircraft type. This was a further reason for resolving the violation of the Third Normal Form.
As a matter of principle, you should correct the entity-relationship model first and then reestablish the tuple types based on the corrected entity-relationship model. When reestablishing the tuple types based on the corrected entity-relationship model, you get tuple types for entity types AIRCRAFT TYPE and MANUFACTURER and for relationship type AIRCRAFT TYPE_from_MANUFACTURER.

The tuple type for AIRCRAFT TYPE does not contain attribute Manufacturer Code! The interrelationship between aircraft types and manufacturers is rather expressed by tuple type AIRCRAFT TYPE_from_MANUFACTURER.

The fact that we get three tuple types seems to be conflicting with the solution developed before. It is not. Tuple types AIRCRAFT TYPE and AIRCRAFT TYPE_from_MANUFACTURER have the same primary key. For each tuple in MANUFACTURER, AIRCRAFT TYPE_from_MANUFACTURER has a corresponding tuple, and vice versa. Therefore, the two tuple types can be combined as will be discussed further in the subsequent unit.



Student Notebook

Figure 6-28. Third Normal Form in Multiple Tuple Types CF182.0

Notes:

This visual illustrates another violation of the Third Normal Form for our sample airline company. Tuple type ENGINE contains the same composite attribute Manufacturer as tuple type AIRCRAFT TYPE before normalization. Thus, it violates the Third Normal Form as well.

The resolution of the violation is the same as for AIRCRAFT TYPE. The composite attribute forms an own tuple type (MANUFACTURER). Except for Manufacturer Code, the attributes of composite attribute Manufacturer are removed from tuple type AIRCRAFT TYPE.

This raises the question if tuple type MANUFACTURER is the same as created for AIRCRAFT TYPE? Both tuple types have the same attributes. As usual, the question must be answered by the application domain expert. From the real world, we know that some manufacturers produce both aircraft and engines whereas other only manufacture engines or aircraft. Thus, how should we solve the problem? The alternatives are discussed on the next two visuals.

Third Normal Form in Multiple Tuple Types

ENGINEEngine Number, PK

ManufacturerEngine Type

Manufacturer Code

AddressCompany Name


Phone Number

MANUFACTURERManufacturer Code, PK

AddressCompany Name


Phone Number

ENGINEEngine Number, PK

Manufacturer CodeEngine Type

3NF

Is this the same MANUFACTURER as for AIRCRAFT TYPE???




Uempty

Figure 6-29. 3rd NF in Multiple Tuple Types (Alternative 1) CF182.0

Notes:

This visual illustrates a possible solution for the problem raised on the previous visual. The solution is discussed for the entity-relationship model changes required. The tuple types then follow automatically. Since the attributes for engine and aircraft manufacturers are the same, you can use the same entity type MANUFACTURER (and, thus, tuple type) to store information about both. For each manufacturer, the entity type contains one entity instance.

To complete the entity-relationship model, you need relationship types AIRCRAFT TYPE_from_MANUFACTURER and ENGINE_from_MANUFACTURER expressing the interrelationships between aircraft types and manufacturers and engines and manufacturers.

However, the solution has one problem: It is possible to establish relationships between aircraft types and manufacturers just producing engines and between engines and manufacturers only manufacturing aircraft.

This problem is generally considered a data-entry problem. Your data would also be wrong if you specified the wrong aircraft manufacturer for an aircraft type or the wrong engine manufacturer for an engine. Therefore, most application domain experts and database

1. .1

m

_for_

_from_

1. .1m

_for_

D 1. .m

_on_

1 m

1. .1

_from_

m

DC

_in_

1. .1

AIRCRAFT MODEL

AIRCRAFT TYPE

MANU- FACTURER

ENGINEAIRCRAFT

ENGINE LOCATION

3rd NF in Multiple Tuple Types (Alternative 1)



Student Notebook

designers will just go with this solution without further constraints. To solve the problem completely, you can:

• introduce an additional attribute Manufacturer Type specifying the type of manufacturer (engine manufacturer, aircraft manufacturer, or both engine and aircraft manufacturer)

• define constraints for relationship types AIRCRAFT TYPE_from_MANUFACTURER and ENGINE_from_MANUFACTURER restricting the instances of the relationship types based on the values of attribute Manufacturer Type of entity type MANUFACTURER.

The constraints for the relationship types prevent the improper assignment of manufacturers.




Uempty

Figure 6-30. 3rd NF in Multiple Tuple Types (Alternative 2) CF182.0

Notes:

This visual illustrates an alternate solution using supertypes and subtypes.

MANUFACTURER is made a supertype for subtypes AIRCRAFT MANUFACTURER and ENGINE MANUFACTURER. MANUFACTURER contains instances for all manufacturers. AIRCRAFT MANUFACTURER contains instances for manufacturers producing aircraft (and possibly engines) and ENGINE MANUFACTURER instances for manufacturers producing engines (and possibly aircraft).

In addition. relationship types are established between entity type AIRCRAFT TYPE and subtype AIRCRAFT MANUFACTURER and entity type ENGINE and subtype ENGINE MANUFACTURER.

Unless you want to store additional information for the different manufacturer types (a possible by-product of the solution), subtypes AIRCRAFT MANUFACTURER and ENGINE MANUFACTURER have a single attribute (Manufacturer Code). Therefore, if you do not have additional information for the different manufacturer types, the solution would probably be considered exaggerated.

3rd NF in Multiple Tuple Types (Alternative 2)

1. .1

m

_for_

_for_

D 1. .m

_on_

1 m

DC

_in_

1. .1

S

1. .m

DC DC1 1

_is_

1. .1

m

_from_

m

1. .1_from_

MANU- FACTURER

AIRCRAFT MODEL

AIRCRAFT TYPE

ENGINEAIRCRAFT

ENGINE LOCATION

AIRCRAFT MANUFACTURER

ENGINE MANUFACTURER



Student Notebook

Figure 6-31. Fourth Normal Form - Definition CF182.0

Notes:

The Fourth Normal Form requires that:

• the tuple type is in the Third Normal Form and

• its attributes do not have multivalued dependencies on each other.

A multivalued dependency involves three attributes. If attribute-1, attribute-2, and attribute-3 are attributes of the same tuple type, attribute-3 is said to be multivalued dependent on attribute-1 by the way of attribute-2 if the following is true:

For each value of attribute-2 occurring with a specific, but arbitrary, value of attribute-1, the tuple type must contain tuples with the same values for attribute-3.

To make this definition more understandable, let us assume a21, a22, and a23 are values of attribute-2 occurring with value a11 for attribute-1 in tuples of the tuple type. Furthermore, assume that the tuples for a11 and a21 have the following values for attribute-3:

Fourth Normal Form - Definition

A tuple type is in the Fourth Normal Form if:

It is in the Third Normal Form

Its attributes do not have multivalued dependencies on each other

Let attribute-1, attribute-2, and attribute-3 be attributes of a tuple type.

For each value of attribute-2 occurring with a specific, but arbitrary, value of attribute-1, the tuple type must contain

tuples with the same values for attribute-3

Attribute-3 is multivalued dependent on attribute-1 by the way of attribute-2 if:

Multivalued dependencies may lead to group inconsistencies




Uempty

Then, multivalued dependency of attribute-3 on attribute-1 means that the following tuples must exist for the combination a11 and a22:

Similarly, for the combination a11 and a23, the following tuples must exist:

Multivalued dependencies may lead to group inconsistencies due to insert, delete, or update operations paying no attention to the multivalued dependency.

attribute-1 attribute-2 attribute-3

a11 a21 a31

a11 a21 a32

a11 a21 a33

a11 a21 a34


a11 a22 a31

a11 a22 a32

a11 a22 a33

a11 a22 a34


a11 a23 a31

a11 a23 a32

a11 a23 a33

a11 a23 a34



Student Notebook

Figure 6-32. Fourth Normal Form - Sample Tuple Type CF182.0

Notes:

The above tuple type has not been the result of the creation of the tuple types for our sample airline company called Come Aboard. It has been created artificially to demonstrate a violation of the Fourth Normal Form. It has been created by joining the tuple types for two m:m relationship types.

The tuple type lists, for the various aircraft models, both the pilots that can fly them and the mechanics that are trained for them, i.e., can maintain them.

Each tuple contains an aircraft model (type code and model number), a pilot employee number, and a mechanic employee number. Composite attribute Aircraft Model has been used for clarity reasons. It groups the two attributes Type Code and Model Number uniquely identifying aircraft models. All cardinalities of the tuple type are implicitly defined and, therefore, are [1..1]. Thus, the tuple type is in First Normal Form.

All attributes belong to the primary key for the tuple type since:

• An aircraft model can be flown by many pilots, and a pilot can fly many aircraft models.

Fourth Normal Form - Sample Tuple Type

Aircraft Model, Pilot Employee Number, and Mechanic Employee Number all belong to primary key

An aircraft model can be flown by many pilots; a pilot can fly many aircraft modelsMany mechanics may be trained for an aircraft model; a mechanic may be trained for many aircraft models

No interdependencies between pilots and mechanics

Aircraft Model, PKType CodeModel Number

Employee Number AS Pilot Employee Number, PKEmployee Number AS Mechanic Employee Number, PK

PILOTS_and_MECHANICS_for_ AIRCRAFT MODEL

Lists in same tuple type: Pilots that can fly an aircraft modelMechanics that can maintain (are trained for) an aircraft model

A tuple consists of an aircraft model, a pilot employee number, and a mechanic employee number




Uempty
• Many mechanics may be trained for an aircraft model, and a mechanic may be trained for many aircraft models.
Consequently, the tuple type is in the Second Normal Form and even in the Third Normal Form.

It is a further assumption for the tuple type that there are not any special interdependencies between pilots and mechanics.



Student Notebook

Figure 6-33. Fourth Normal Form - Instance Example CF182.0

Notes:

This visual illustrates an instance example for the tuple type explained on the previous visual. Attribute Mechanic Employee Number is multivalued dependent on composite attribute Aircraft Model, i.e., on Type Code and Model Number, by the way of Pilot Employee Number:

• Take a specific aircraft model, for example, the Boeing B747, Model 400.

• It occurs together with pilot employee numbers 0491337, 0844092, and 0003613.

• For the selected aircraft model and pilot employee number 0491337, the mechanic employee numbers are 5219330 and 6027005.

• Since the mechanics trained for an aircraft model have nothing to do with the pilots, the same mechanics need be listed for pilot numbers 0844092 and 0003613.

• Similar considerations apply to any other aircraft model selected. For the Airbus A310, Model 300, tuples with the same mechanic employee numbers must exist for pilots 3721040 and 1662951.

Fourth Normal Form - Instance Example

Type CodeModel

NumberPilot

Employee NumberMechanic

Employee Number

B747 400 0491337 5219330

B747 400 0491337 6027005

B747 400 0844092 5219330

B747 400 0844092 6027005

B747 400 0003613 5219330

B747 400 0003613 6027005

A310 300 1662951 4421026

A310 300 1662951 6027005

A310 300 1662951 1427254

A310 300 3721040 4421026

A310 300 3721040 6027005

A310 300 3721040 1427254

Same

Same

Mechanic Employee Number multivalued dependent on Aircraft Model (Type, Model Number) by the way of Pilot Employee Number




Uempty
Accordingly, the tuple type violates the Fourth Normal Form. Improper insertions or deletions of tuples could result in inconsistent data by violating the multivalued dependencies. For example, if the tuple for aircraft model B747, Model 400, pilot employee number 0844092, and mechanic employee number 6027005 were deleted, the data would be inconsistent.
By the way, multivalued dependencies always come in pairs. If attribute-3 is multivalued dependent on attribute-1 by the way of attribute-2, then attribute-2 is multivalued dependent on attribute-1 by the way of attribute-3. In case of our example, Pilot Employee Number is multivalued dependent on Aircraft Model by the way of Mechanic Employee Number. The proof is left to you.



Student Notebook

Figure 6-34. Fourth Normal Form - Solution CF182.0

Notes:

To solve Fourth Normal Form violations, the multivalued interdependencies between the attributes must be unbundled by creating separate tuple types. One tuple type is created for each relationship type. Accordingly, you get a tuple type for:

• the interdependency between aircraft models and pilots which is nothing else than relationship type PILOT_can_fly_AIRCRAFT MODEL

• the interdependency between aircraft models and mechanics corresponding to relationship type MECHANIC_trained_for_AIRCRAFT MODEL

Since each tuple type only contains a single employee number, AS clauses need not be used. The purpose of the employee numbers is apparent from the meaning of the tuple types.

The instances for the two tuple types are shown on the right-hand side of the visual.

If you have properly identified all relationship types in the entity-relationship model and have not hidden them in entity types, you should not have violations of the Fourth Normal

Fourth Normal Form - Solution

Type CodeModel

NumberPilot

Employee Number

B747 400 0491337

B747 400 0844092

B747 400 0003613

A310 300 1662951

A310 300 3721040

Type CodeModel

NumberMechanic

Employee Number

B747 400 5219330

B747 400 6027005

A310 300 4421026

A310 300 6027005

A310 300 1427254


Employee Number, PK

PILOT_can_fly_ AIRCRAFT MODEL


Employee Number, PK

MECHANIC_trained_for_ AIRCRAFT MODEL

Violations of Fourth Normal Form should not occur if entity-relationship model established correctly




Uempty
Form. The above example was created by joining the tuple types for the two relationship types.


Student Notebook

Checkpoint


1. Tuple types are the first result of storage view. (T/F)

2. Tuple types must not have composite attributes. (T/F)

3. What is the purpose of cardinalities for attributes of tuple types?

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

4. Tuple types are established for:

a. The dependent entity types only of the entity-relationship model.

b. All entity types of the entity-relationship model except dependent entity types.

c. All entity types of the entity-relationship model.

d. All relationship types of the entity-relationship model.

e. Most relationship types of the entity-relationship model.

f. Owning relationship types.

5. For an entity type, multiple tuple types may be established. (T/F)

6. How are the tuple types for entity types established?

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________




Uempty
7. For which relationship types must tuple types not be established?
_____________________________________________________

_____________________________________________________

_____________________________________________________

8. In the documentation of a tuple type, how are the components of a composite attribute identified?

_____________________________________________________

_____________________________________________________

_____________________________________________________

9. In the documentation of a tuple type, what is the means for identifying the role a data element or data group plays for an attribute?

_____________________________________________________

_____________________________________________________

_____________________________________________________

10.Which cardinality is assumed if none has been specified for an attribute in the tuple type documentation?

a. [0..1]

b. [0..*]

c. [1..*]

d. [1..1]

11. Establish the tuple type for relationship type MAINTENANCE RECORD_belongs_to_ MAINTENANCE RECORD for our sample airline company called Come Aboard.

_____________________________________________________

_____________________________________________________

_____________________________________________________

12.Which of the following choices are objectives the normalization of tuple types wants to achieve? Normalization wants to:



Student Notebook

a. Avoid difficulties with the conversion of tuple types into tables.

b. Improve the performance of the database being designed.

c. Reduce the size of the tuple types.

d. Remove redundancies within tuple types.

e. Remove redundancies across tuple types.

f. Avoid data inconsistencies resulting from insert, update, or delete operations.

13.What do the Normal Forms define?

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

14.What do you achieve by resolving violations of the First Normal Form?

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

15.How do you recognize repeating groups?

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

16.Which modifications of the entity-relationship model does the resolution of a First Normal Form violation generally require?




Uempty
_____________________________________________________
_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

17. If a tuple type is in Second Normal Form, none of its elementary nonkey attributes is functionally dependent on other nonkey attributes. (T/F)

18. If a tuple type is in Third Normal Form, all its elementary nonkey attributes are functionally dependent on the entire primary key. (T/F)

19.Which modifications does the resolution of a Third Normal Form violation generally require for the entity-relationship model?

_____________________________________________________

_____________________________________________________

_____________________________________________________

20.How can data groups help during normalization?

_____________________________________________________

_____________________________________________________

_____________________________________________________

21.Tuple types for relationship types can never violate any of the Normal Forms. (T/F)



Student Notebook


Notes:


Tuple types are the first (intermediate) result of storage view

Form the basis for the computerized processing of the entity types and relationship types of the entity-relationship model

Tuple types are established for all entity types and most relationship types of the entity-relationship model

None for owning relationship types

None for m:m relationship types being the source (target) of another relationship type with a minimum target (source) cardinality of 1

Tuple types for entity types consist of attributes for entity type

Primary key consists of attributes of entity key

Tuple types for relationship types consist of defining attributes for relationship type

Primary key consists of attributes of relationship key

Tuple types consist of attributes and have a primary key

Attributes may be elementary or composite

Attributes have cardinalities specifying how many values the attribute must assume at least and at most in the scope used




Uempty


Notes:


Established tuple types must be normalized to:

Make them convertible into tablesRemove redundancies within tuple typesAvoid data inconsistencies caused by insert, update, and delete anomalies

First Normal Form requires no repeating groups

Second Normal Form requires functional dependence of nonkey attributes on entire primary key

Normal Forms define states or quality levels for the tuple types

Five Normal Forms

Only first three of practical relevanceSubsequent Normal Form based on previous Normal Form

Third Normal Form requires no functional dependence of nonkey attributes on other nonkey attributes

Properly established data groups/composite attributes help identify attributes to be moved together to new tuple types during normalization

Update entity-relationship model, problem statement, and data inventory accordingly



Student Notebook




Uempty
Unit 7. From Tuple Types to Tables

This unit describes how you get from the tuple types for the application domain to the tables of the target database management system. It explains how multiple tuple types can be combined into a single tuple type and, thus, become a single table. Conversely, it discusses how a tuple type can be split into multiple tuple types to cope with restrictions imposed by the target database management system.

Furthermore, the unit outlines how the tables and the objects associated with them are established.



• Combine tuple types to reduce the number of tables required.

• Split tuple types to cope with database limitations or performance degradations.

• Denormalize tuple types as required for performance reasons.

• Establish the tables for the tuple types including

- The translation of abstract data types for attributes.

- The definition of data types and column attributes for the columns of the tables.

- The documentation of the necessary database objects.


Accountability:



© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-1

Student Notebook


Notes:

Conceptually, the tuple types established so far could immediately be converted into tables of the target database management system. However, this would result in more tables than necessary making it harder and more expensive than necessary to retrieve and maintain the data for the application domain. Therefore, it is desirable to combine multiple tuple types into a single tuple type, and thus a single table, if possible and reasonable. We will discuss in this unit when tuple types can be combined.

Furthermore, limitations of the target database management system may not allow you to convert the tuple types one-to-one into tables. The limitations as well as performance considerations may force you to split tuple types vertically or horizontally into multiple smaller tuple types which then can be implemented as tables.

Performance considerations may induce you to reverse normalizations you performed and to take care of the resulting problems in a different manner. You might also want to denormalize tuple types that were separate and not created by normalizations.

After these steps, you can establish the tables for the application domain. This includes:

Unit Objectives


Denormalize tuple types as required for performance reasons

Combine tuple types to reduce the number of tables required

Split tuple types to cope with database limitations or performance degradations

Establish the tables for the tuple types including:

The translation of abstract data types for attributes

The definition of data types and column attributes for the columns of the tables

The documentation of the necessary database objects




Uempty
• Implementing the abstract data types for the attributes of the tuple types.
• Determining the data types (abstract or built-in) and column attributes for the columns of the tables.

• Documenting the tables, their columns, and the related database objects for the application domain.



Student Notebook




Uempty
7.1 Combining and Splitting Tuple Types


Student Notebook

Figure 7-2. Tables in Design Process CF182.0

Notes:

This unit deals with the establishment of the tables for the target database management system. The tables are the containers for the data of the application domain. Thus, we are right in the heart of physical design, i.e., of the storage view.

Tables in Design Process

ConceptualView


Tables

Tuple Types

Data InventoryProcess Inventory

Indexes


Integrity Rules

Problem Statement





Uempty

Figure 7-3. Tables for Tuple Types CF182.0

Notes:

Formally, the tuple types established so far can be translated into tables of the target database management system as follows:

• For each tuple type, one table is created. The name for the table must follow the rules for table names of the target database management system. There are length restrictions for table names as well as restrictions on the characters they may include. Unless you use delimited identifiers for table names, the table name may, for example, not include blanks. However, they may generally include underscores (_). Thus, it is a good idea to replace blanks in the names of the tuple types by underscores. Delimited identifiers have the disadvantage that you need to specify enclosing double-quotes for all references in SQL statements.

• Each (direct or indirect) elementary attribute of the tuple type becomes a column of the table for the tuple type.

At present, composite attributes cannot be reflected in tables, only their elementary components, the elementary components of their composite components, and so on. Thus, tables cannot reflect the structure imposed by the composite attributes.

Tables for Tuple Types

AIRCRAFT MODEL

Model Number, PK

LengthHeight

Type Code, PK

Dimensions

Wing Span


Weights

Cruising Speed

TYPE_CODE

MODEL_NUMBER

LENGTH HEIGHTWING_SPAN

NET_WEIGHT

MAXIMUM_WEIGHT

CRUISING_SPEED

AIRCRAFT_MODEL



Student Notebook

The columns receive names in the target database management system. To column names, the same restrictions apply as to table names. Furthermore, the names for the columns of a table must be unique. Thus, if the same data group is used multiple times (in different roles) by a tuple type, you must name the components of the corresponding composite attributes differently. One way to achieve this is including the name of the composite attribute (or part of it) in the column name. However, you must ensure that the length restrictions for column names are adhered to.

• As for tuple types, a primary key is established for each table uniquely identifying the rows of the table. The elementary attributes of the primary key for the tuple type become the columns of the primary key for the table.




Uempty

Figure 7-4. Conversion of Tuple Types into Tables CF182.0

Notes:

The bullets in the gray box on this visual have already been described in the student notes for the previous visual.

The elementary attributes for the tuple type are based on data elements in the data inventory. In turn, the data elements are associated with data types. These data types must be reflected in the target database management system. They can be implemented by means of:

• Built-in data types for the target database management system or user defined distinct types. Built-in data types are data types provided by the target database management system. They are also referred to as standard data types. User defined distinct types are data types that you can define yourself based on the built-in data types. Both built-in data types and user defined distinct types will be discussed later in this units.

• In addition, the implementation of the data types for the data elements may need user defined functions, check constraints, and triggers. User defined functions allow you to perform customized operations for your data. Check constraints allow you to introduce

Each tuple type becomes a tableTable name must follow rules for target DBMS

Each elementary attribute of tuple type becomes a column of the table

Each elementary component of each composite attribute becomes a columnCurrently, composite attributes themselves cannot be defined in all DBMS, but ... Column names must follow rules for target DBMS

Elementary attributes for primary key of tuple type become primary key for table

Data types for data elements associated with elementary attributes must be implemented by means of:

Built-in data types for target DBMS or user defined distinct types

User defined functions, check constraints, and/or triggers

Conversion of Tuple Types into Tables

However . . .However . . .



Student Notebook

value constraints for the columns of a table. Triggers allow you to perform selected actions as the consequence of database insert, update, or delete operations.

All these items will be discussed later in this unit.

As we mentioned before, the tuple types can formally be translated into tables in the manner described. However, you should further manipulate the tuple types before converting them into tables. The subsequent visuals will discuss why you should do this and what you should do.




Uempty

Figure 7-5. Problems With One-to-One Conversion CF182.0

Notes:

As described before, formally, the normalized tuple types could be converted into tables one-to-one. However, this may result in more tables than required unnecessarily complicating queries and programs by Join operations. In addition, the Join operations result in performance degradations for queries and programs.

To avoid these problems, tuple types should be combined into a single tuple type where possible and reasonable before converting them into tables.

Size limitations for the target database management system are a second problem preventing the one-to-one conversion of tuple types into tables. Such limitations are upper limits for the row size, the number of columns, or the table size. They may force you to split a tuple type vertically or horizontally into multiple tuple types before creating the tables.

A third consideration is that the resulting tables may have columns with very different usage characteristics and different importance for the application domain. Some of the columns may never be used together. Some of them may only be used by unimportant and not performance-critical business processes whereas the others are used by important,

Problems With One-to-One Conversion

Vertically or horizontally split tuple type before

creating tables

. . . you should further manipulate the tuple types before creating tables . . .

Direct conversion of normalized tuple types may result in more tables than necessary

Unnecessary Join operations complicating queries and programs

Unnecessary Join operations impacting performance

Size limitations of target DBMS may not allow implementation of resulting tables

Resulting tables may have columns with a different importance for application domain

Unnecessary retrieval of not required data

Negative performance impact on applications only using important data

Combine tuple type before creating tables

Vertically split tuple type before creating tables



Student Notebook

performance-critical, processes. In this case, it may also make sense to split the tuple types vertically before creating the tables to separate columns with different usage profiles.




Uempty

Figure 7-6. Merging Partial Tuple Types CF182.0

Notes:

Tuple types having the same primary key can be united in a single tuple type if they contain, at all times, tuples with corresponding primary key values. This means that each primary key value in one tuple type also occurs in the other tuple type and vice versa.

The two tuple types can be combined by adding the nonkey attributes of one tuple type to the other tuple type. Note that it may be necessary to rename some of the added attributes.

It does not matter which tuple type is integrated in the other tuple type. In general, you will integrate the tuple type with the smaller number of nonkey attributes in the other tuple type. You may consider renaming the unified tuple type.

Since the original tuple types form parts of the larger, unified, tuple type, the unification is referred to as merging of partial tuple types.

The example on the visual merges tuple types AIRCRAFT and AIRCRAFT MODEL_for_AIRCRAFT being tuple types for an entity type and a relationship type, respectively. For both tuple types, Aircraft Number is the primary key.

Merging Partial Tuple Types

Same primary key

AFTER

AIRCRAFT

Aircraft Number

Date Manufactured

Date in Service

TypeCode

ModelNumber

B474001323 1994-10-12 1997-01-01 B747 400

B373004518 1999-02-28 1999-03-15 B737 300

B373004519 1999-03-31 1999-04-20 B737 300

A103000534 1998-05-12 1998-07-21 A310 300

A103003167 1997-08-01 1997-09-01 A310 300

A402004217 1999-10-23 1999-11-15 A340 200

AIRCRAFT MODEL _for_AIRCRAFT

Aircraft Number

TypeCode

ModelNumber

B474001323 B747 400

B373004518 B737 300

B373004519 B737 300

A103000534 A310 300

A103003167 A310 300

A402004217 A340 200

Aircraft Number

Date Manufactured

Date in Service

B474001323 1994-10-12 1997-01-01

B373004518 1999-02-28 1999-03-15

B373004519 1999-03-31 1999-04-20

A103000534 1998-05-12 1998-07-21

A103003167 1997-08-01 1997-09-01

A402004217 1999-10-23 1999-11-15

AIRCRAFT

BEFORE

One-to-one correspondence of primary key

values



Student Notebook

Since relationship type AIRCRAFT MODEL_for_AIRCRAFT is mandatory for entity type AIRCRAFT, a relationship instance must exist for each aircraft expressing to which aircraft model the aircraft belongs. Thus, for each tuple of tuple type AIRCRAFT, a tuple must exist in tuple type AIRCRAFT MODEL_for_AIRCRAFT.

Since relationships always require that the corresponding source and target instances exist, for each instance of relationship type AIRCRAFT MODEL_for_AIRCRAFT, the appropriate aircraft must exist in entity type AIRCRAFT. Consequently, for each tuple in tuple type AIRCRAFT MODEL_for_AIRCRAFT, a tuple with the same primary key value must exist in tuple type AIRCRAFT.

Thus, the two tuple types must always contain tuples with the same primary key values and can be combined. Attributes Type Code and Model Number of tuple type AIRCRAFT MODEL_for_AIRCRAFT are added to tuple type AIRCRAFT to identify the aircraft model for the aircraft.




Uempty

Figure 7-7. Finding Partial Tuple Types from ER Model CF182.0

Notes:

For the sample tuple types on the previous visual, we used the entity-relationship model to determine if the tuple types could be combined. This raises the question if the entity-relationship model can generally be used to determine the partial tuple types that can be merged? Indeed, the entity-relationship model helps to determine them.

In the following cases, the tuple types for entity types or relationship types represent partial tuple types and can be merged:

• One of the tuple types is for a dependent entity type with a cardinality of 1..1 (for the owning relationship type). The other tuple type may be for an entity type or a relationship type. In this case, the two tuple types can be combined, for example, by integrating the tuple type for the dependent entity type into the other tuple type.

Because of cardinality 1..1 for the dependent entity type, both tuple types have the same primary key: Being a dependent entity type means that the own entity key includes the key of the parent. Because of maximum cardinality 1, the entity key of the dependent entity type need not and must not contain additional attributes.

Finding Partial Tuple Types from ER Model

D Dependent Entity Type


1. .1Tuple Type 1

Tuple Type 2

1. .1


Entity type or relationship type . .m

Tuple Type 1

Tuple Type 2

Tuple Type 1

Tuple Type 2

1. .1 0. .1



Relationship key = key of source

Tuple Type 1

Tuple Type 2

Tuple Type 3

Depending on relationship key selected

OR

1. .11. .1





Student Notebook

Being a dependent entity type also means that, for every entity instance, the parent contains an instance with the corresponding key value. Conversely, the minimum cardinality of 1 requires that the dependent entity type contains an instance for every parent instance.

• One of the tuple types is for a relationship type with cardinality 1..1 for one end (e.g., the target) and maximum cardinality m for the other end. In this case, the tuple types for the relationship type and for the end with maximum cardinality m can be combined. For example, the tuple type for the relationship type can be integrated into the tuple type for the end with maximum cardinality m.

Because of the cardinalities, the key for the relationship type consists of the key of the end with maximum cardinality m. Thus, the corresponding tuple types have the same primary key.

Cardinality 1..1 enforces that, for every instance of the end with maximum cardinality m, the relationship type contains one and only one instance with the same key values. Since source and target must exist for relationship instances, the end with maximum cardinality m must contain, for every relationship instance, an instance with the same key value. Thus, the corresponding tuple types are partial tuple types and can be combined. For example, the tuple type for the relationship type can be integrated into the tuple type for the end with maximum cardinality m.

This constellation represents the one on the previous visual.

• One of the tuple types is for a relationship type with cardinality 1..1 for one end (e.g., the target) and cardinality 0..1 for the other end. In addition, the key of the relationship type has been chosen to be the key of the end with cardinality 0..1. In this case, the tuple types for the relationship type and for the end with cardinality 0..1 are partial tuple types and can be combined. For example, the tuple type for the relationship type can be integrated into the tuple type for the end with cardinality 0..1.

Since the key of the end with cardinality 0..1 has been chosen as relationship key, the primary keys of the two tuple types are the same. (Note that there was a choice for the relationship key because both maximum cardinalities were 1.) For the same reasons as for the previous case, the two tuple types must at all times have corresponding primary key values.

• One of the tuple types is for a relationship type with cardinality 1..1 for both ends. In this case, the tuple type for the relationship type can be combined with the tuple type for the source or with the tuple type for the target. With which tuple type it can be combined, depends on which of the keys has been made the relationship key: If the key of the source has been selected, the tuple type for the relationship type and the tuple type for the source can be combined. If the key of the target has been selected, the tuple type for the relationship type and the tuple type for the target can be combined.

• As you can imagine, combinations of the above cases may lead to cascaded mergers of tuple types.




Uempty
Theoretically, it is possible that other tuple types can be combined as well. However, you should only combine tuple types that can be combined directly or through several mergers. Tuple types that cannot be combined by subsequent mergers have nothing to do with each other. They lead to columns in tables that are never used together and, therefore, may negatively impact performance.
It must be decided from case to case whether or not the combination of the tuple types should be reflected in the entity-relationship model. If tuple types for relationship types are involved, you do not want to reflect the merging of the tuple types in the entity-relationship model. The entity-relationship model would no longer correctly describe the interrelationships between entity types and relationship types.



Student Notebook

Figure 7-8. Imbedding Detail Tuple Types CF182.0

Notes:

Tuple type T2 can be imbedded into tuple type T1 if:

1. Both tuple types have the same primary key.

2. The primary key values of T2 form, at all times, a subset of the primary key values of T1.

3. For each tuple of T2, at least one of the nonkey attributes has a value. It need not necessarily be the same attribute for all tuples.

The resulting extended tuple type T1 contains all attributes it contained before and the nonkey attributes of tuple type T2. Note that it may be necessary to rename some of the added attributes.

Tuples of old tuple type T1 not having a counterpart in T2 do not have a value for any attributes added to new tuple type T1. Tuples of old tuple type T1 with a counterpart in T2 have a value for at least one attribute added to new tuple type T1 (third condition).

After the elimination of T2, it is still possible to determine the original tuple types (and, thus, entity types or relationship types) for the various tuples. Thus, their (original) identity has been preserved and no information has been lost.

Imbedding Detail Tuple Types

AFTER

Engine Number

Engine Type

Manufacturer Code

Aircraft Number

Engine Position

PW9880193 PW4062 PW B474001323 1

PW9880194 PW4062 PW B474001323 2

PW9880195 PW4062 PW

PW9882345 PW4062 PW B474001323 3

PW9974034 PW4062 PW B474001323 4

A862946RR RB211-254 RR

A59A350RR RB211-254 RR

R375184566 CF6-80C2 GE A103003167 1

R375184567 CF6-80C2 GE

R375184568 CF6-80C2 GE A103003167 2ENGINE

BEFORE

ENGINE_on_

AIRCRAFT

Engine Number

Aircraft Number

Engine Position

PW9880193 B474001323 1

PW9880194 B474001323 2

PW9882345 B474001323 3

PW9974034 B474001323 4

R375184566 A103003167 1

R375184568 A103003167 2

Engine Number

Engine Type

Manufacturer Code

PW9880193 PW4062 PW

PW9880194 PW4062 PW

PW9880195 PW4062 PW

PW9882345 PW4062 PW

PW9974034 PW4062 PW

A862946RR RB211-524 RR

A59A350RR RB211-524 RR

R375184566 CF6-80C2 GE

R375184567 CF6-80C2 GE

R375184568 CF6-80C2 GEENGINE

Same primary keyAt least one nonkey column

always contains a value




Uempty
Since the tuples of T2 provide additional details for tuples of T1, tuple type T2 is referred as a detail tuple type.
In the example on the visual, tuple type ENGINE_on_AIRCRAFT is a detail tuple type for tuple type ENGINE. It provides further detail information for engines, namely, where they are mounted. ENGINE_on_AIRCRAFT was created during in Unit 6 - Tuple Types as a consequence of normalization.

Both tuple types have the same primary key Engine Number. Since not all engines are mounted on aircraft, the primary key values of ENGINE_on_AIRCRAFT form a subset of the primary key values of tuple type ENGINE. Attribute Aircraft Number of tuple type ENGINE_on_AIRCRAFT always has a value so that the third condition for the imbedding of tuple types is satisfied. Accordingly, ENGINE_on_AIRCRAFT is indeed a detail tuple type of ENGINE and can be imbedded.

Resulting new tuple type ENGINE contains all attributes it had before plus the nonkey attributes of ENGINE_on_AIRCRAFT. Tuples of old tuple type ENGINE that did not have a counterpart in ENGINE_on_AIRCRAFT do not have values for the attributes added to tuple type ENGINE. Tuples that had a counterpart in ENGINE_on_AIRCRAFT have values for the added attributes.



Student Notebook

Figure 7-9. Finding Detail Tuple Types from ER Model CF182.0

Notes:

As for partial tuple types, the entity-relationship model can be used to determine the detail tuple types that can be imbedded into other tuple types.

In the following cases, the tuple types for entity types or relationship types represent detail tuple types and can be imbedded in other tuple types:

• One of the tuple types is for a dependent entity type with a cardinality of 0..1 (for the owning relationship type). The other tuple type may be for an entity type or a relationship type. In addition, for each instance of the dependent entity type, at least one nonkey attribute must always have a value. In this case, the tuple type for the dependent entity type can be imbedded in the tuple type for the parent.

Because of cardinality 0..1 for the dependent entity type, both tuple types have the primary key: Being a dependent entity type means that the own entity key includes the key of the parent. Because of maximum cardinality 1, the entity key of the dependent entity type need not and must not contain additional attributes.

Finding Detail Tuple Types from ER Model

For each instance, a nonkey attribute must have a value

Defining attribute not belonging to key has always a value

Tuple Type 1

Tuple Type 2

0. .1 1. .1



Relationship key = key of sourceDefining attribute not belonging to key has always a value

D Dependent Entity Type


0. .1Tuple Type 1

Tuple Type 2

0. .1


Entity type or relationship type . .m

Tuple Type 1

Tuple Type 2

Depending on relationship key selected

OR

Tuple Type 1

Tuple Type 2

Tuple Type 3

0. .10. .1



Defining attribute not belonging to key has always a value




Uempty
Being a dependent entity type also means that, for every entity instance, the parent contains an instance with the corresponding key value. Minimum cardinality 0 permits that the dependent entity type does not contain an instance for every parent instance.
• One of the tuple types is for a relationship type with cardinality 0..1 for one end (e.g., the target) and maximum cardinality m for the other end. In this case, the tuple type for the relationship type can be imbedded in the tuple type for the end with maximum cardinality m.

Because of the cardinalities, the key for the relationship type consists of the key of the end with maximum cardinality m. Thus, the corresponding tuple types have the same primary key.

Cardinality 0..1 permits that the relationship type does not contain an instance for every instance of the end with maximum cardinality m. Since source and target must exist for a relationship instance, the end with maximum cardinality m must contain, for every relationship instance, an instance with the same key value. Since the defining attributes not being part of the relationship key contain a value for every relationship instance, the third condition for detail tuple types is automatically satisfied. Thus, the tuple type for the relationship type is a detail tuple type. It can be imbedded in the tuple type for the end with maximum cardinality m.

• One of the tuple types is for a relationship type with cardinality 0..1 for one end (e.g., the target) and cardinality 1..1 for the other end. In addition, the key of the relationship type has been chosen to be the key of the end with cardinality 1..1. In this case, the tuple type for the relationship type is a detail tuple type and can be imbedded in the tuple type for the end with cardinality 1..1.

Since the key of the end with cardinality 1..1 has been chosen as relationship key, the primary keys of the two tuple types are the same. (Note that there was a choice for the relationship key because both maximum cardinalities were 1.)

• One of the tuple types is for a relationship type with cardinality 0..1 at both ends. In this case, the tuple type for the relationship type can be imbedded in the tuple type for the source or in the tuple type for the target: If the key of the source has been selected as relationship key, the tuple type for the relationship type can be imbedded in the tuple type for the source. If the key of the target has been selected as relationship key, the tuple type for the relationship type can be imbedded in the tuple type for the target.

• As you can imagine, combinations of the above cases may lead to cascaded imbeds of tuple types.

Theoretically, other cases are possible. However, in cases that are not equivalent to cascaded imbeds, you should not imbed the detail tuple type. The two tuple types concerned have nothing to do with each other. Imbedding the detail tuple type leads to columns in tables that are never used together and, therefore, may negatively impact performance.

It must be decided from case to case whether or not the combination of the tuple types should be reflected in the entity-relationship model. If tuple types for relationship types are



Student Notebook

involved, you do not want to reflect the imbedding of tuple type in the entity-relationship model. The entity-relationship model would no longer correctly describe the interrelationships between entity types and relationship types.

As a conclusion of the previous three visuals, you can say:

Tuple types for 1:1 or 1:m relationship types can alw ays be merged or imbedded .




Uempty

Figure 7-10. Decomposition of Super Tuple Types (1 of 2) CF182.0

Notes:

Let T, T1, T2, ..., Tn be tuple types with the following characteristics:

• All tuple types have the same primary key.

• At all times, each primary key value of T occurs in at most one of the tuple types T1 through Tn. This means that the primary key values of T1 through Tn are disjunctive.

• At all times, the primary values of T1 through Tn occur in tuple type T.

By adding the nonkey attributes of T to each of the tuple types T1 through Tn, the primary key value sets of T and T1 through Tn can be made disjunctive. The tuples of T with counterparts in T1 through Tn are removed from T and combined with the appropriate tuples in T1 through Tn.

T1 through Tn are called a (partial) decomposition of T. Since the role of tuple type T has changed, you should considered renaming it to correctly reflect its changed role.

If, at all times, each primary key value of T occurs in one of the tuple types T1 through Tn, tuple type T can be eliminated. T1 through Tn then form a perfect decomposition of T.

Decomposition of Super Tuple Types (1 of 2)

BEFORE

EMPLOYEE

Employee Number

Last Name First Name Date of Birth

4627953 Miller Jonathan 1968-02-29

7003001 Ambrose Anna 1980-05-12

0562091 Repairmaid Susan 1975-03-17

2342007 Handyman Peter 1974-04-20

0491337 Miller Jack 1961-07-21

1662951 Smith Joe 1962-09-01

0844092 Ferguson Jane 1965-04-15

MECHANIC

Employee Number

Date of Certification

2342007 1998-03-31

0562091 1999-02-25

PILOT

Employee Number

Pilot Level

0844092 Copilot

1662951 Captain

0491337 Captain

All tuple types have same primary key

Key values of first tuple type occur in at most one of the other tuple types

All key values of other tuple types occur in first tuple type



Student Notebook

The situation described here exists for class (supertype/subtype) structures with exclusive subtype sets. For this reason, tuple type T is referred to as super tuple type. Is the subtype set also covering, the super tuple type can be eliminated.

The example on the visual illustrates the tuple types for a class structure with an exclusive, but not covering, subtype set. The employees of Come Aboard may be pilots, mechanics, or other types of employees. However, they may not be pilots and mechanics at the same time. Since pilots or mechanics are employees at the same time, each tuple of PILOT or MECHANIC has a counterpart in EMPLOYEE.

As illustrated on the next visual, the primary key values of EMPLOYEE, PILOT, and MECHANIC can be made disjunctive.




Uempty

Figure 7-11. Decomposition of Super Tuple Types (2 of 2) CF182.0

Notes:

After the decomposition, tuple types PILOT and MECHANIC include all nonkey attributes of EMPLOYEE (e.g., Last Name, First Name, and Date of Birth). Tuple type EMPLOYEE has been renamed to OTHER EMPLOYEE to emphasize its changed role. Now, an employee is either in OTHER EMPLOYEE or in PILOT or in MECHANIC, but not in more than one.

If the employees of Come Aboard could only be pilots or mechanics, tuple type OTHER EMPLOYEE would not be needed, i.e, tuple type EMPLOYEE were eliminated completely.

You should note that, for the illustrated tuple types, generally, you would not perform a decomposition of the super tuple type.

If the decomposition is a perfect decomposition, it should be reflected in the entity-relationship model.

Decomposition of Super Tuple Types (2 of 2)

AFTER

OTHER EMPLOYEE

Employee Number


4627953 Miller Jonathan 1968-02-29

7003001 Ambrose Anna 1980-05-12

MECHANIC

Employee Number

Date of Certification


2342007 1998-03-31 Handyman Peter 1974-04-20

0562091 1999-02-25 Repairmaid Susan 1975-03-17

PILOT

Employee Number

Pilot Level


0844092 Copilot Ferguson Jane 1965-04-15

1662951 Captain Smith Joe 1962-09-01

0491337 Captain Miller Jack 1961-07-21



Student Notebook

Figure 7-12. Combining Tuple Types - Considerations CF182.0

Notes:

The merging, imbedding, and decomposition of tuple types described on the preceding visuals can be performed without the loss of information. However, there are a few things to be considered which may make you not combine the tuple types:

• Do not combine tuple types which have nothing to do with each other; whose attributes are never processed together; or whose attributes are only processed together by business processes that are not performance-critical.

If you combined the tuple types, other critical business processes might experience a performance degradation. The rows for the appropriate tables would become longer resulting in fewer rows per page (physical blocks) and, thus, fewer rows per buffer. This might increase the number of I/O operations required when processing or searching the table sequentially.

• When imbedding a detail tuple type, other tuple types should not be referentially be dependent on the detail tuple type. A tuple type is referentially dependent on another tuple type if the values of one or more of its attributes must always be a subset of the values of a corresponding set of attributes of the other tuple type.

Combining Tuple Types - Considerations

Do not combine tuple types if their attributes are not processed together or only by not performance-critical business processes

Otherwise, other critical business processes may experience a degradation

When imbedding detail tuple types, other tuple types should not be referentially dependent on the detail tuple type

Otherwise, referential integrity cannot be enforced by referential integrity support of target DBMS

When decomposing super tuple types, other tuple types should not be referentially dependent on the super tuple type

Otherwise, referential integrity cannot be enforced by referential integrity support of target DBMS

When combining tuple types, size limitations for the target DBMS may become effective forcing you to split the tuple type again

When combining tuple types, limitations for the referential integrity support of the target DBMS may become effective not existing otherwise

For example, restrictions for referential cycles and delete-connected tables




Uempty
If you imbed the referentially dependent tuple type, its referential integrity can no longer be enforced by means of the referential integrity support of the target database management system. You then must use other means to ensure the integrity of the data (e.g., program logic or, if supported, triggers).
• When decomposing a super tuple type, other tuple types should not be referentially dependent on the super tuple type.

If you decompose the super tuple type, the referential integrity of dependent tuple types can no longer be enforced by means of the referential integrity support of the target database management system.

• When combining tuple types, restrictions or limitations for the referential integrity support of the target database management system may become effective which would not exist otherwise. These limitations deal with referential cycles and delete-connected tables. Referential cycles and delete-connected tables will be discussed in a later unit.

These restrictions can also become effective when you merge or imbed the tuple types for 1:1 or 1:m relationship types and you might consider not to merge or imbed them.

• When combining tuple types, size limitations for the target database management system may become effective forcing you not to combine the tuple types or to split them differently.



Student Notebook

Figure 7-13. Limitations and Consequences CF182.0

Notes:

All database management systems have limitations. The above visual illustrates the typical limitations.

Most database management systems store the rows for tables into fixed-length pages, i.e., blocks of a fixed length. In general, a row must fit into a single page. The page size can be chosen from predefined values and is the same for all pages of a table (or a set of tables). For DB2 Universal Database for example, the page size can be 4096, 8192, 16384, or 32768 bytes.

The selection of a page size causes two problems:

• The maximum length of a row is restricted by the chosen page size. As a solution, you could choose a bigger page size provided the target database management system supports a bigger page size.

However, for a few exceptional rows, you do not always want to choose a larger page size. For the direct retrieval of rows, a larger page size may mean that you read more data than necessary for the majority of rows. The I/O operation for the larger page size

Limitations and Consequences

There is an upper limit for the number of columns a table can have

There is an upper limit for the amount of space that a table can occupy

Most database management systems store rows for tables into fixed-length pages. The rows must fit into a single page

Maximum row length limited by chosen page size

Unused space if fixed-length rows are just a little longer than half the page size, a third of the page size, and so on

Typical Limitations of Database Management Systems

Cannot combine tuple types in a table that could be combined otherwise

Cannot denormalize tuple types

Must perform additional normalizations of tuple types

Must vertically split tuple types

Must horizonally split tuple types

Possible Consequences




Uempty
takes longer resulting in an undesirable performance degradation. Even for sequential retrieval, a larger page size can negatively impact the overall system performance since it may hamper concurrent requests for other tables.
• The fixed page size may result in a lot of unused space. Assume that all rows for a table have the same fixed length and that the length is just a little over half a page. As a consequence, only a single row fits into a page and nearly half the page is wasted. If the row size is just over one third of the page size, you wasted about one third of the space, and so on. The smaller the row size, the less space is wasted.

As a second limitation, there is typically an upper limit for the number of columns that a table can have.

The third limitation common to all database management systems is that there is an upper limit for the amount of space a table can occupy. In the course of time, the last two limitations have been relaxed and will be relaxed even more.

If you hit one of the limitations mentioned above, the consequences are that:

• You cannot combine tuple types that could be combined otherwise. • You cannot denormalize tuple types although you would like to. • You must perform additional normalizations you did not want to do. • You must vertically split tuple types. • You must horizontally split tuple types.



Student Notebook

Figure 7-14. Denormalization CF182.0

Notes:

Let T1 and T2 be tuple types satisfying the following conditions:

• T1 and T2 have different primary keys.

• T1 has a set of attributes corresponding to the primary key of tuple type T2.

• At all times, the primary key of T2 and the corresponding attributes of T1 contain the same values.

In this case, tuple type T2 can be integrated into tuple type T1 without loss of information by adding the nonkey attributes of T2 to T1. However, as a consequence, information may have to be stored redundantly in the integrated attributes.

This process is called denormalization since it represents a conscious violation of the Second Normal Form or the Third Normal Form.

Frequently, the primary key values of T2 form a superset of the values of the corresponding attributes of T1. In this case, you must decide if you can do without the tuples of T2 which do not have a counterpart in T1. This means you accept the loss of information.

Denormalization

BEFORE

AIRCRAFT

Aircraft Number

Date Manufactured

Date in Service

TypeCode

ModelNumber

B474001323 1994-10-12 1997-01-01 B747 400

B373004518 1999-02-28 1999-03-15 B737 300

B373004519 1999-03-31 1999-04-20 B737 300

A103000534 1998-05-12 1989-07-21 A310 300

A103003167 1997-08-01 1997-09-01 A310 300

A402004217 1999-10-23 1999-11-15 A340 200

AIRCRAFT MODEL

TypeCode

ModelNumber

Length Height

A340 200 59.40 16.91

A310 300 46.67 15.81

B737 300 33.41 11.13

B747 400 70.67 19.33

AFTER

AIRCRAFT

Aircraft Number

Date Manufactured

Date in Service

TypeCode

ModelNumber

Length Height

B474001323 1994-10-12 1997-01-01 B747 400 70.67 19.33

B373004518 1999-02-28 1999-03-15 B737 300 33.41 11.13

B373004519 1999-03-31 1999-04-20 B737 300 33.41 11.13

A103000534 1998-05-12 1989-07-21 A310 300 46.67 15.81

A103003167 1997-08-01 1997-09-01 A310 300 46.67 15.81

A402004217 1999-10-23 1999-11-15 A340 200 59.40 16.91




Uempty
The example on the visual integrates tuple type AIRCRAFT MODEL into tuple type AIRCRAFT. Together, attributes Type Code and Model Number of tuple type AIRCRAFT (T1) correspond to the primary key of tuple type AIRCRAFT MODEL (T2). Since the source cardinality for relationship type AIRCRAFT MODEL_for_AIRCRAFT, in the entity-relationship model for Come Aboard, is 1..1, there is an aircraft model for every aircraft. Thus, each value of attribute pair (Type Code, Model Number) in T1 occurs as primary key value of T2. However, because of target cardinality m for the relationship type, there need not be an aircraft for every aircraft model. Consequently, AIRCRAFT MODEL cannot be integrated in AIRCRAFT unless your decision is not to keep information about aircraft models for which there is not an aircraft.
Since denormalization can be seen as a reversal of normalization, it reintroduces the problems you tried to solve by normalization:

• Since every primary key value of T2 may occur multiple times in T1, information is redundantly stored in the resulting combined tuple type. Consequently, you must ensure that the attributes of T2 added to T1 are changed for all tuples with the same primary key value of T2 at the same time. This can be achieved by using proper mass UPDATE SQL statements for the (table of the) resulting tuple type.

Similarly, when adding a new tuple, it must be ensured that redundant information is consistent with information already contained in existing tuples. This can be achieved by copying the corresponding information from the existing tuples rather that entering it again.

To reduce the risk of inconsistent redundant information as much as possible, you should not allow end users to issue UPDATE or INSERT SQL statements against not normalized tables. Rather, you should provide front-ends (to be used by the end users) that include the proper UPDATE and INSERT statements.

• If the last tuple for a former primary key value of T2 is deleted, all T2-related information for this value is lost. Similarly, you cannot add information about a new primary key value of T2 without adding T1-related information at the same time.

For the example on the visual, when you delete the last Boeing 747, Model 400 aircraft (B474001323), the information about the aircraft model is lost as well. Also, as outlined above, you cannot add information about a new aircraft model without entering information about an aircraft for that aircraft model at the same time.

When denormalizing tuple types, other tuple types should not be referentially dependent on the integrated tuple types. Otherwise, the referential integrity of dependent tuple types can no longer be enforced by means of the referential integrity support of the target database management system.

If you look at the entity-relationship model for Come Aboard, you will see that entity type AIRCRAFT MODEL is source or target of many relationship types. This means that many tuple types are referentially dependent on it. Therefore, you would never integrate tuple type AIRCRAFT MODEL into tuple type AIRCRAFT.



Student Notebook

It must be decided from case to case, if the denormalization should be reflected in the entity-relationship model. It should be reflected in the entity-relationship model if it combines the tuple types for two entity types.

The primary reason for denormalization is performance. However, because of the problems involved with denormalization, you should investigate very carefully if the gain is worth the trouble. If the table for the integrated tuple type always contains only a very few rows (e.g., just a page), denormalization will not bring a lot. After the first request, the page will be in the buffers of the target database management system. Immediate subsequent requests will not require an I/O operation. Also, locating the appropriate rows in the page does not dramatically add to the processor time. However, to come to a reliable decision, you should use the tools provided by the target database management system (such as EXPLAIN) to determine the behavior of critical requests.




Uempty

Figure 7-15. Vertical Splitting of Tuple Types CF182.0

Notes:

Vertical splitting of a tuple type means that some attributes of the tuple type are moved to a new tuple type with the same primary key. Of course, you should not arbitrarily split a tuple type, but rather move attributes that logically belong together to the new tuple type. The composite attributes for a tuple type identify attributes that belong together. They are a big help when splitting a tuple type.

Limitations for the target database management system are one reason for splitting tuple types. Another, equally important, reason are different usage profiles for the attributes of the tuple type:

• Some attributes are never used together with other attributes.

• Some attributes are used very seldom and, then, together with other attributes, only in business processes that are not performance-critical.

In these cases, splitting the tuple type may increase the performance of other, performance-critical, business processes. As a consequence of the splitting, the rows for the important tables become shorter and more rows will fit into a page. Thus, more rows

Vertical Splitting of Tuple Types

BEFORE

AIRCRAFT MODEL

TypeCode

ModelNumber

Length Height Wing Span

Net Weight

Maximum Weight

Cruising Speed

Range

A340 200 59.40 16.91 60.30 156500 274980 890 14800

A310 300 46.67 15.81 43.90 93710 164400 860 9600

B737 300 33.41 11.13 28.88 35805 62820 795 4175

B747 400 70.67 19.33 64.31 226237 396890 930 13570

Dimensions

TypeCode

ModelNumber

Length Height Wing Span

A340 200 59.40 16.91 60.30

A310 300 46.67 15.81 43.90

B737 300 33.41 11.13 28.88

B747 400 70.67 19.33 64.31

AIRCRAFT MODEL DIMENSIONS

TypeCode

ModelNumber

Net Weight

Maximum Weight

Cruising Speed

Range

A340 200 156500 274980 890 14800

A310 300 93710 164400 860 9600

B737 300 35805 62820 795 4175

B747 400 226237 396890 930 13570

AIRCRAFT MODEL

AFTER



Student Notebook

can be made available with a single I/O operation and kept in buffers of the target database management system.

You should note, however, that vertical splitting only makes sense if the rows for the corresponding table are not already very small. Some database management systems limit the maximum number of rows per page. Thus, if the row size becomes too small, you lose space without gaining performance.

When vertically splitting a tuple type, you effectively create a new dependent entity type. The dependent entity type should be reflected in the entity-relationship model.

In the example on the visual, the dimensions for aircraft models are removed from tuple type AIRCRAFT MODEL. They are moved to a new tuple type called AIRCRAFT MODEL DIMENSIONS. The dimensions are less frequently used than the weights for the aircraft models. Dimensions was a composite attribute of old tuple type AIRCRAFT MODEL.

Vertical splitting is the inverse of merging and imbedding of tuple types. If the original tuple type contained tuples not having a value for any of the removed attributes, the new dependent tuple type contains fewer tuples than the parent tuple type. You need not and should not keep tuples just consisting of a value for the primary key and not containing other useful information.




Uempty

Figure 7-16. Horizontal Splitting of Tuple Types CF182.0

Notes:

Horizontal splitting of tuple types means that you partition the tuples of the tuple types. Basically, you create multiple tuple types with the same attributes as the original tuple type. Each of the new tuple types contains a part of the tuples of the old tuple type. The new tuple types are referred to as partitions (of the old tuple type).

How the tuples are partitioned is completely up to the application domain, and you should consult the application domain expert for advice. The partitioning need not be based on key ranges for the primary key.

In the example on the visual, the engines are partitioned into active engines and retired engines. Retired engines are engines permanently taken out of service. Active engines are engines still used by aircraft, even though they may not be mounted at present. The appropriate tuple types have been called ENGINE (for the active engines) and RETIRED ENGINE. We could have called the tuple type for the active engines differently, but it seemed handy to still call it ENGINE.

As illustrated on the visual, it might happen that, for a partition, some of the attributes do not assume a value for any of the tuples. These attributes can be dropped from the tuple

Horizontal Splitting of Tuple Types

AFTER

Engine Number

Engine Type

Manufacturer Code

Aircraft Number

Engine Position

PW9880193 PW4062 PW B474001323 1

PW9880194 PW4062 PW B474001323 2

PW9880195 PW4062 PW

PW9882345 PW4062 PW B474001323 3

PW9974034 PW4062 PW B474001323 4

R375184566 CF6-80C2 GE A103003167 1

R375184567 CF6-80C2 GE

R375184568 CF6-80C2 GE A103003167 2ENGINEEngine Number

Engine Type

Manufacturer Code

Aircraft Number

Engine Position

A862946RR RB211-254 RR

A59A350RR RB211-254 RRRETIRED ENGINE

Engine Number

Engine Type

Manufacturer Code

Aircraft Number

Engine Position

PW9880193 PW4062 PW B474001323 1

PW9880194 PW4062 PW B474001323 2

PW9880195 PW4062 PW

PW9882345 PW4062 PW B474001323 3

PW9974034 PW4062 PW B474001323 4

A862946RR RB211-254 RR

A59A350RR RB211-254 RR

R375184566 CF6-80C2 GE A103003167 1

R375184567 CF6-80C2 GE

R375184568 CF6-80C2 GE A103003167 2ENGINE

BEFORE



Student Notebook

type. On the visual, this is the case for attributes Aircraft Number and Engine Position of tuple type RETIRED ENGINE: Retired engines are not and will not be mounted on aircraft.

Especially, if the partitions receive a new meaning, the horizontal splitting should be reflected in the entity-relationship model.

When horizontally splitting tuple types, other tuple types should not be referentially dependent on the split tuple type; otherwise, their referential integrity can no longer be enforced by means of the referential integrity support of the target database management system.

One reason for the horizontal splitting of tuple types are size limitations for tables. Another reason may be that you want to assign the tuples for different responsibilities, branches, or uses to different tables to avoid concurrent access problems.




Uempty
7.2 Physical Implementation


Student Notebook

Figure 7-17. Built-In Data Types CF182.0

Notes:

When creating the tables for the target database management system, you must define the columns for the tables. Defining the columns means that you have to specify a name, a data type, and some additional column attributes for them. The names for the columns must follow the rules for the target database management system and must be unique for each table as was discussed earlier in this unit.

For the data types, you must translate the application-domain specific data types for the corresponding data elements into data types supported by the target system. Each database management system provides a set of built-in (standard) data types. For many columns, the built-in data types are sufficient. For data elements based on abstract data types, the built-in data types might not be sufficient and additional functions of the target database management system must be used to simulate the abstract data types as closely as possible. For now, let us concentrate on the built-in data types. Abstract data types will be discussed later in this topic.

Most of the database management systems provide built-in data types for character strings, numeric data, datetime data, and binary strings:

Built-In Data Types

Binary large object

Date

Time

Timestamp

Small

Large

Big

Real

Double

Packed

Binary Integer Decimal Floating

Point

Fixed length

Varying length

Character large object

Fixed length

Varying length

Double-byte character large object

Double-Byte String

Single-Byte String

Binary String

Numeric Data

Character String

Datetime Data

Built-In Data Types




Uempty
• The data types for numeric data generally support integers, decimal numbers, and floating-point numbers of varying sizes. The data types intended for integers generally have binary representations of two (SMALLINT), four (INTEGER), or eight (BIGINT) bytes supporting integers of different sizes. Check the reference manuals for your database management system to determine the data types supported and their value ranges. , for example, currently does not support BIGINT.
Decimal numbers are generally specified by means of DECIMAL(m[,n]) or NUMERIC(m[,n]). Both specifications represent the same data type. m specifies the number of digits and n the number of decimal places. If n is not specified, zero is assumed, i.e., the numbers are integers. Internally, decimal numbers are mostly stored in packed format. This means that each digit and the sign occupy half a byte. Again, check the reference manuals for the supported syntax and the value ranges.

Floating-point numbers are approximations of real numbers. Normally, the target database management systems support data types for single precision (REAL) and double precision (DOUBLE). DOUBLE provides a better approximation of the real numbers, but occupies more storage. In general, the representations occupy four and eight bytes, respectively. Because of the different internal representations, check the reference manuals for your database management system for the types supported and their value ranges.

• The data types for character strings support single-byte character strings and double-byte character strings. Single-byte character strings are sequences of one-byte characters. Thus, each byte of the string represents a character of the underlying character set. Frequently, if the context is clear, the term character string is used to denote single-byte character strings.

Double-byte character strings are also referred to as graphic strings. They are sequences of two-byte characters as required, for example, for some Asian character sets. Thus, every two bytes of the string represents a character of the underlying character set.

Both for single-byte and double-byte character strings, there are data types for fixed-length strings, short varying-length strings, and large varying-length strings. The latter are referred to as character large objects. For single-byte character strings, the appropriate data types are CHARACTER(), VARCHAR(), and CLOB(). For double-byte strings, they are GRAPHIC(), VARGRAPHIC(), and DBCLOB(). The maximum length for the various data types depends on the target database management system. Thus, check the reference manuals for your database management system to determine the types and the maximum lengths supported. Character large objects allow millions or even billions of characters.

• The datetime data types include data types for the date, the time, and timestamps. The appropriate data types are DATE, TIME, and TIMESTAMP. As usual, the date includes two digits for the day, two digits for the month, and four digits for the year. The time includes two digits each for the hour, the minute, and the second. Timestamps include date, time, and microseconds.



Student Notebook

• Binary strings can be binary large objects (data type BLOB()). They are strings of bytes. Unlike character strings which usually contain text data, they are used to hold nontraditional data such as pictures. The maximum length can be millions or even billions of bytes.

Here are some design considerations for the built-in data types:

• Data that are numeric should be defined as numeric data and not as character strings even if you will not perform calculations with them. When the data are defined as numeric to the database management system, the database management system can verify the correctness of the data for you. In addition, it can check if the data fall in the supported or defined (check constraints) value ranges.

If the data are defined as character strings, all characters of the character set are valid and the business processes must verify the correctness of the data themselves.

• For integer data, you have multiple choices for the data type. If binary integer data types support the expected value range for the column, choose one of them because binary-integer operations are generally cheaper. Choose the data type that best fits the size of your expected data, but make sure that future extensions will not make the data type obsolete. Rather, choose the next bigger data type. To change the data type afterwards, you must delete the table and recreate it. This has consequences for the objects based on the table and for authorizations you have granted.

• For character columns, you may have the choice between CHARACTER and VARCHAR. If the actual length of the values varies, VARCHAR may save space. However, you should be aware that the system adds two bytes for storing the length in case of VARCHAR. Also, VARCHAR may slightly increase the processing time. Furthermore, programmers do not like to work with varying-length data.

Therefore, only use VARCHAR if the length of the data varies considerably or you do not have another choice because of the maximum length of the data. As a ballpark figure, the difference between the average length and the maximum length for the column should be greater than 25 bytes. The information for the corresponding data element in the data inventory should tell you this.

If your target system supports compression, the space argument for VARCHAR disappears and there is even less reason to use VARCHAR if you can use CHARACTER instead.

• If you have VARCHAR columns, you should define them as last columns of the table to save processing time. The sequence in which the columns are defined does not mandate a sequence for their retrieval. For mass retrieval, some database management systems calculate the offsets of the various columns once and not for every row retrieved. They can only do this for the columns preceding the first varying-length column and for the first varying-length column.




Uempty

Figure 7-18. Column Attributes - Nullable Columns CF182.0

Notes:

For a tuple type, some attributes (e.g., the primary key attributes) need assume a value for every tuple whereas others need not. To correctly reflect this, it must be possible to specify for the columns of the corresponding tables whether or not they must assume a value for every row.

Indeed, it can be specified for a column, as a column attribute (characteristic), whether or not a value must be provided for every row. A column that need not assume a value for every row is referred to as a nullable column.

Internally, most database management systems use a special indicator, referred to as null indicator, to indicate if the column has a value for a row. If a column does not have a value for a row, it is said that the column has the value NULL for the row. This is a way of speaking even though it is a contradiction in terms.

If the value for a column is NULL for a row, a value has not been provided for that row. For numeric columns, this is different from a value of 0 (zero) for the column. For fixed-length character columns, it is different from a value of all blanks. For varying-length character columns, it is different from a character string of length 0.

Column Attributes - Nullable Columns

Is not mounted

Is not mounted

Engine_Number

Engine_Type

Manufacturer_Code

Aircraft_Number

Engine_Position

PW9880193 PW4062 PW B474001323 1

PW9880194 PW4062 PW B474001323 2

PW9880195 PW4062 PW --NULL-- --NULL--

PW9882345 PW4062 PW B474001323 3

PW9974034 PW4062 PW B474001323 4

M18940012 CFM56 CFM A192003001 0

M18940015 CFM56 CFM A192003001 1

M18940168 CFM56 CFM --NULL-- --NULL--

ENGINE

Is mounted on position 0 (may be a valid position)

If the value for a column of a row is NULL, no value has been provided for it

Different from 0 (zero) for numeric columnsDifferent from blanks for fixed-length character columnsDifferent from a string of length 0 for varying-length character columns

Columns need not assume a value for every row, i.e., a value need not necessarily be provided for each row

Column referred to as nullable

For rows without values, column is said to assume a value of NULLSpecial indicator used to indicate if the column has a value for a row

Characteristic (attribute) for column



Student Notebook

In the example on the visual, engine M18940012 has an engine position of 0 meaning that it is mounted on an aircraft in position 0. It does not mean that the engine is not mounted. Zero may be a valid engine position.

In contrast, engines PW9880195 and M18940168 have an engine position of NULL. This means that an engine position has not been provided for them: they are not mounted on an aircraft.

You should be aware that NULL values may lead to different results for SQL functions or operations than values of 0 or blanks or strings of length 0. For example, this is the case for the column functions AVG and COUNT and for Join operations.

Nullable columns occupy a little additional storage and their handling requires a little extra processing time. However, the additional storage or processing time is insignificant. You should define columns that, from the perspective of the application domain, may not contain a value as nullable and not try to save the extra overhead.




Uempty

Figure 7-19. Nullable Columns and Cardinalities CF182.0

Notes:

As you certainly remember, we have introduced cardinalities for the attributes of tuple types. The minimum cardinality for an attribute determines whether or not, in the context used, the attribute must always assume a value. Thus, the minimum cardinalities for the attributes determine whether or not the corresponding columns must always have a value.

The first example on the visual shows tuple type ENGINE. Its first three attributes do not have a cardinality specified. This means that their implied cardinality is [1..1]. Since their minimum cardinality is 1, the attributes and, thus, the corresponding columns must always have a value. This can be defined by specifying NOT NULL for the columns.

The last two attributes of tuple type ENGINE have a minimum cardinality of 0. Therefore, they need not assume a value for every tuple. Accordingly, the corresponding columns need not assume a value for every row, i.e., the columns are nullable. This can be defined by not specifying NOT NULL for the columns. By default, columns are nullable.

The second example on the visual illustrates the cardinalities for tuple type FLIGHT and demonstrates that the cardinalities must be interpreted in the context of the comprising structure: The first four attributes of FLIGHT are elementary attributes of the tuple type and

Nullable Columns and Cardinalities

ENGINE

Engine Type

Aircraft Number [0..1]Engine Position [0..1]

Engine Number, PK

Manufacturer Code

NOT NULLNOT NULLNOT NULLNullableNullable

FLIGHT



Flight Locator, PKDeparture AS Planned Departure


Flight Number, PK


Arrival AS Actual Arrival [0..1]





NOT NULLNOT NULLNOT NULLNOT NULL

NOT NULLNOT NULL

NOT NULLNOT NULL

NullableNullable

NullableNullable



Student Notebook

have an implied minimum cardinality of 1. Since they are direct attributes of the tuple type, their minimum cardinality determines directly whether or not the corresponding columns are nullable.

The other elementary attributes are components of composite attributes. All their minimum cardinalities are 1. The minimum cardinalities of components do not alone determine whether or not the corresponding columns are nullable. However, if the minimum cardinality is 0, the column must be nullable.

If the minimum cardinality is 1, the associated column may still have to be nullable. This depends on the minimum cardinality of the comprising composite attribute, the minimum cardinality of the composite attribute comprising the composite attribute, and so on. If the minimum cardinality of the comprising composite attribute is 1 and the composite attribute is not again a component of another composite attribute, the corresponding column is not nullable. It must be defined with NOT NULL. If the composite attribute is again contained in a composite attribute, the minimum cardinality of the latter decides if the column will be nullable.

If the minimum cardinality of the composite attribute comprising the elementary attribute is 0, the corresponding column must be defined as nullable.

In the example on the visual, composite attributes Planned Departure and Planned Arrival have a minimum cardinality of 1. Since they are not again components of another composite attribute, the columns for their elementary attributes must be defined with NOT NULL. In contrast, composite attributes Actual Departure and Actual Arrival have a minimum cardinality of 0. Accordingly, the columns associated with their elementary attributes must be defined as nullable despite of the minimum cardinality of 1 for the elementary attributes.

This added complexity stems from the fact that relational database management systems currently do not support composite attributes.




Uempty

Figure 7-20. Column Attributes - Default Values CF182.0

Notes:

The discussions about columns that always must have a value or need not have a value for a row raise some questions:

• Independent of whether or not the column is nullable, what happens if a value is not provided for a row? Does the system provide a default value?

• For nullable columns, does the column receive the value NULL or another default value?

This and the next visual will answer these questions.

Most target database management systems allow you to specify that a default value should be assumed if a value is not provided for a row. The default values assumed can be system defaults or user defaults.

System defaults are default values used by the database management system if:

• The column may assume default values. • The database administrator has not defined an own default for the column. • The user has not provided a column value for a row.

Column Attributes - Default Values

User Defaults

Default Values

System Defaults

String of length 0

Variable-Length Strings

Current date

Current time

Current timestamp

Date Time Timestamp

Datetime Data

Blanks

Fixed-Length Strings

0

Numeric Data



Student Notebook

All three conditions must be satisfied.

User defaults are default values defined by the database administrator for columns. The selected values can be different from the system default values. User defaults values are assumed if:

• The column may assume default values. • The database administrator has defined an own default value for the column. • The user has not provided a column value for a row.

All three conditions must be satisfied.

The visual illustrates the system default values for the various categories of data types.




Uempty

Figure 7-21. Selection of Default Values CF182.0

Notes:

When defining a column for a table, you specify if the column may assume default values and which default value it should assume. This is controlled by the WITH DEFAULT keywords.

If the column is nullable and you do not specify WITH DEFAULT, the implicit default for the column is the NULL value. That is, the column will not contain a value for a row, if the user does not provide a value for the row on inserts.

If you specify WITH DEFAULT for nullable columns, the default value assumed depends on whether or not you have provided an own default value. If you have not provided a default value, the column will assume the system default value for the category of column. If you provide your own default value, you can specify any value compatible with the data type for the column or explicitly request that the column is set to NULL.

Similarly, for columns that always must assume a value (NOT NULL), you can request that they assume a default value for a row if a value has not been provided. If you specify WITH DEFAULT, but do not provide an own default value, the system default for the appropriate category of data type is assumed. If you provide an own default value, it is assumed.

Selection of Default Values

column- name data-type

NULL

WITH DEFAULT

Nothing specified

System default

Value specified

User provided value

NULL NULL

Must provide value

WITH DEFAULT

Nothing specified

System default

Value specified

User provided valueNOT NULL

(nullable)



Student Notebook

Finally, if you do not specify WITH DEFAULT for a column that always must have a value, a value must be provided for every row inserted; otherwise, the request fails.




Uempty

Figure 7-22. Considerations for Abstract Data Types CF182.0

Notes:

When implementing abstract data types as discussed in Unit 5 - Data and Process Inventories, the following considerations apply.

• Each abstract data type has its own set of allowable values and you must ensure that the values are properly represented in the database of the target database management system.

In some cases (e.g., for our sample abstract data type called name data), you want to store the data in a normalized format. Thus, you must ensure that the data in the database are in the normalized format.

• Abstract data types can be parameterized. In particular, they may allow you to specify minimum and maximum lengths for each usage by data elements. Thus, when implementing the abstract data type, you must ensure that the length constraints for data elements are reflected as constraints for the columns and enforced for each usage.

In addition, the data elements of the application domain may have domains, i.e., value constraints further restricting the values of their abstract data types. When defining a

Considerations for Abstract Data Types

When implementing an abstract data type :

You must ensure that the data for the abstract data type are properly represented in the database

You must ensure that the data for the abstract data type satisfy any value and length constraints imposed on them

You must ensure that the desired operations, and only those, can be performed with data of the abstract data type

You want to provide functions converting external input into the stored format for the abstract data type



Student Notebook

column based on such a data element, you must ensure that its value constraints are adhered to.

• Abstract data types generally provide a set of operations. When implementing an abstract data type, you must ensure that these operations can be performed. Also, you want to ensure that other illegal operations cannot be performed with data of the abstract data type.

• If data can be entered by end users in different formats, but you want to store the data in a normalized format, you should provide functions converting the external input into the normalized format. You need these functions for comparing entered data with the stored data (e.g., in the WHERE clause of SELECT statements). If the data entered were compared directly with the stored data, you would not necessarily find the requested data.




Uempty

Figure 7-23. User Defined Distinct Types CF182.0

Notes:

User defined distinct types (UDTs) allow you to define your own data types based on the built-in data types provided by the target database management system. However, they are fairly simple-minded data types and cannot be parameterized.

When you create a user defined distinct type, you must select a built-in data type.

The built-in data type is referred to as source data type. If the source data type allows you to specify a length, a number of digits, or a number of decimal places, you must specify the appropriate values when you create the user defined distinct type.

Even if the user defined distinct type is based on a varying-length built-in data type, you cannot specify a length later when the user defined distinct type is used as data type for a column. The maximum length for the column is that defined for the user defined distinct type. If you want to use the same user defined distinct type for multiple columns, you must define it with the maximum length for any anticipated columns. You must use other means to restrict the actual lengths of the columns. Alternatively, use different user defined distinct types.

User Defined Distinct Types

User Defined Distinct Types

Allow you to define your own data types based on built-in data types

Cannot be parameterized

Always have a fixed maximum length

Even if based on a varying-length built-in data type

For a varying-length source data type, the maximum length is the length specified when the user defined distinct type is created

Cannot specify a different (smaller) maximum length when the user defined distinct type is used by a column

Must define it with the maximum length intended for any columns and restrict the actual column lengths by other means

Disallow all operations for source data type except comparisons

Can only compare data of same user-defined distinct type

Cannot compare directly with data of source data type

Must cast to source data type to compare with source data type

Prevents illegal operations and incorrect comparisons



Student Notebook

With the exception of the comparison operations, the user defined distinct type does not inherit any functions or operations of its source data type. Without further actions, you cannot use any scalar or column functions for the source data type.

The comparison operations inherited are limited to the comparison of data belonging to the user defined distinct type. You cannot directly compare data of the user defined distinct type with data of the source data type. When you create a user defined distinct type, cast functions are provided allowing you to change source data to data of the user defined distinct type and vice versa. The cast data can then be compared with data of the appropriate data type. The cast function changing data of the source data type to data of the user defined distinct type has the same name as the user defined distinct type:

udt-name(source-data) t udt-data

The cast function changing data of the user defined distinct type to data of the source data type has the same name as the source data type:

source-name(udt-data) t source-data

By using user defined distinct types, you can prevent illegal operations and incorrect comparisons for columns of the same source data type having different semantics.

User defined distinct types are not supported by all target database management systems.




Uempty

Figure 7-24. User Defined Distinct Types - Example CF182.0

Notes:

The example on the visual creates two user defined distinct types: One user defined distinct type is based on built-in data type DECIMAL and is intended to represent measurements in meters; the other is based on built-in data type INTEGER and is supposed to represent measurements in centimeters. Their names are METER and CM, respectively.

When creating the table for tuple type AIRCRAFT MODEL, columns Length_of_Model and Height_of_Model are defined with user defined distinct type CM. Column Wing_Span is defined with user defined distinct type METER. (Note that you should really have defined all three dimensions with the same user defined distinct type.)

If you want to determine all aircraft models whose length is smaller than their wing span, you cannot specify Length_of_Model < Wing_Span in the WHERE clause of the SELECT statement. This is because the user defined distinct types of Length_of_Model and Wing_Span are different. The comparison would indeed provide an incorrect result and is, therefore, considered illegal.

User Defined Distinct Types - Example

SELECT * FROM AIRCRAFT_MODEL WHERE Length_of_Model < Wing_Span

Different user defined distinct types ILLEGAL!!!

AIRCRAFT MODEL

Model Number, PK

LengthHeight

Type Code, PK

Dimensions

Wing Span


Weights

Cruising SpeedRange

CREATE DISTINCT TYPE METER AS DECIMAL(5,2) WITH COMPARISONS

CREATE DISTINCT TYPE CM AS INTEGERWITH COMPARISONS

CREATE TABLE AIRCRAFT_MODEL(

)

. . .

. . .

Height_of_Model CM NOT NULL, Wing_Span METER NOT NULL,

Length_of_Model CM NOT NULL,



Student Notebook

For a valid and correct comparison, you must cast the two columns to their source data types and convert meters to centimeters in the WHERE clause:

WHERE INTEGER(Length_of_Model) < 100 * DECIMAL(Wing_Span)




Uempty

Figure 7-25. User Defined Functions (UDFs) CF182.0

Notes:

User defined functions (UDFs) allow you to write your own functions for the usage in SQL statements. The user defined functions provided by you can be used in Data Manipulation Language (DML) statements or Data Definition Language (DDL) statements. DML statements are SELECT, INSERT, UPDATE, or DELETE. DDL statements are SQL statements creating, altering, and deleting database objects, such as tables, indexes, user defined distinct types, or user defined functions.

User defined functions can either be external functions or sourced functions. External functions are based on programs, written in any of the programming languages supported by the target database management system, that you provide. Of course, the functions have to follow certain conventions concerning the passing and returning of arguments, but, in the programs, you can pretty much do what you want. Depending on the database management system, you may even issue SQL statements.

Sourced functions are based on existing built-in (system provided) functions or existing user defined functions. Their primary purpose is to extend existing functions (e.g., the AVG function or the LENGTH function) for the source data type to a newly created user defined

User Defined Functions (UDFs)

Sourced functions based on existing built-in or user defined functions

Allow to extend existing functions to new user defined distinct types

External functions are based on programs written by you

Allow you to write your own functionsTo be used in SQL DML statementsTo be used in SQL DDL statements

Scalar functions are passed arguments and return a single value

Column functions are passed a column and return a single value

Table functions are passed arguments and return a table

One row for each invocationCan only be used in FROM clause

User Defined Functions

Scalar Functions

Table Functions

External Functions

Scalar Functions

Column Functions

Sourced Functions



Student Notebook

distinct type. They also allow you to rename an existing built-in function or user defined function.

User defined functions can be scalar functions, column functions, or table functions. Scalar functions are passed a set of arguments and return a single value. An example of a built-in scalar function is the LENGTH function which returns the length of the expression (argument) passed to it.

Column functions are passed the values of a column (or a subset thereof) and return a single value which generally is derived from the values of the column. An example of a built-in column function is the MIN function which returns the minimum of the column values passed to it.

Table functions are passed a set of arguments and return a table row for each invocation. They can only be used in the FROM clause of SELECT statements.

External functions can either be scalar functions or table functions. They cannot be column functions. Sourced functions can only be scalar functions or table functions.

You can overload functions. You can define multiple functions with the same name as long as the signatures of the various functions are different. This means that the data type of at least one parameter must be different. Based on the data types of the arguments passed, the database management system is capable of selecting the proper function.

User defined functions are not supported by all target database management systems.




Uempty

Figure 7-26. UDFs - Definition and Invocation CF182.0

Notes:

This visual illustrates the definition of an external scalar and a sourced scalar user defined function using user defined distinct type TEXTDATA.

The first user defined function, called NORM, checks data of user defined distinct type TEXTDATA passed to it for correctness (it may only contain certain characters) and converts it into a normalized text-data format.

Since the function is passed arguments and returns a single value, it is a scalar function. When you define a function, you must describe the signature of the function. You must specify:

• The name of the function.

• The data type(s) (including lengths) of the arguments passed to the function (i.e., of the parameters for the function) or of the column passed. The latter applies to column functions.

UDFs - Definition and Invocation

CREATE DISTINCT TYPE TEXTDATAAS VARCHAR(100)WITH COMPARISONS

CREATE FUNCTION

RETURNS VARCHAR(100)SOURCE

SYSIBM.SUBSTR(VARCHAR(), INTEGER, INTEGER)

SUBSTR(TEXTDATA, INTEGER, INTEGER)

Must provide signature of functionName of functionData type(s) of parameters for function or of column passed

Must describe output returnedFor scalar or column functions, data type of value returnedFor table functions, names and data types of columns returned

Invocation: function-name ( expression , . . . )

CREATE FUNCTION NORM(TEXTDATA)RETURNS TEXTDATAEXTERNAL NAME 'program'LANGUAGE programming-language. . .

Checks text data string for correctness and

converts it into stored format (normalizes it)

Program Library

Program



Student Notebook

You must also describe the output returned. For scalar or column functions, you must specify the data type of the value returned. For table functions, you must specify the names and the data types of the columns returned.

Function NORM is an external scalar function since it is based on a user-provided program. To allow the database management system to establish the connection to the program when the function is used, the object program to be executed must be identified when the function is defined. So must be the programming language in which the program has been written.

The second function on the visual is a sourced function extending built-in function SUBSTR to text data. When you define it, you must again specify its signature and output for the new data type. Furthermore, you must tell the system on which existing function it is based (SOURCE). For the source function, you must provide its signature as well. For the parameters of the source function, you need not provide lengths or decimal places since they are already known to the system. However, you must specify the enclosing parentheses if the data type has parameters.

The qualifier SYSIBM in the example identifies the source function as a built-in function of an IBM database management system.

A user defined function is invoked by specifying its name followed, in parentheses, by the arguments passed to the function.




Uempty

Figure 7-27. Check Constraints CF182.0

Notes:

Check constraints allow you to restrict the accepted values for columns of tables beyond the values permitted by the column's data type.

Check constraints can be defined on the column level or on the table level. This means that they can be defined for a particular column or for the table as such. When a check constraint is defined for a column, it can just restrict the values for the column concerned. References to other columns are not allowed.

In contrast, a check constraint that is defined on the table level can refer to any defined column of the table. Thus, it can restrict the values of columns in relationship to each other. For example, you may enforce that the values of a column must be existing values of another column.

Check constraints are using check expressions. Basically, a check expression is a search condition evaluating to true, false, or unknown. It may consists of predicates combined by the logical operators AND and OR. A predicate specifies a condition that is true, false, or unknown. The result is unknown, for example, if comparing with a NULL value.

Check Constraints

Allow you to restrict acceptable values for columns

Enforced during the insertion, updating, and loading of rows

Can be defined on column or table level

On column level, to restrict accepted values for column concerned

On table level, to restrict accepted values for columns of table in relationship to each other

Basically, check expression is a search condition evaluating to true, false, or unknown

Predicates can be combined by AND and OR

Restrictions for check expressions depending on target DBMS

For example, for DB2 UDB for UNIX- and Intel-Based Platforms

Subselects not allowedSome restrictions on use of user-defined functions

For example, for DB2 UDB for OS/390

Subselects not allowedBuilt-in or user-defined functions not allowed

CASE expressions not allowedEXISTS and quantified predicates not allowed

First operand of predicate must be a column



Student Notebook

If the check expression for a constraint evaluates to true or unknown, the constraint is considered as satisfied.

Partially, the database management systems have severe restrictions for the check expressions of check constraints. The visual lists some for DB2 Universal Database for z/OS and DB2 Universal Database for UNIX- and Intel-Based Platforms. For the precise restrictions, see the reference manuals for your database management system.

Check constraints are enforced during the insertion, updating, and loading of rows.

Check constraints need not be defined when the table is created. They can be added later. However, they are only enforced during subsequent operations. Existing rows are not automatically rechecked when a check constraint is added.




Uempty

Figure 7-28. Check Constraints - Examples CF182.0

Notes:

The first example on the visual illustrates how abstract data type AIRPORT CODE defined in Unit 5 - Data and Process Inventories could be implemented. The abstract data type has a finite set of values, namely, the three-letter codes for airports. Columns of the abstract data type could be defined as 3-character columns with the check constraint shown on the visual. The check expression for the check constraint uses the IN predicate listing the valid character strings. On the visual, only a few values are shown as indicated by the ellipsis.

The second example on the visual implements the domain for data element Number of Engines defined in Unit 5 - Data and Process Inventories. It uses the BETWEEN predicate to enforce that the values for column Number_of_Engines, i.e., the number of engines for an aircraft type, are between 0 and 4.

Note that check constraints can be named.

Check Constraints - Examples

CREATE TABLE AIRPORT(

'JFK', 'LAS', 'LAX', 'MAD', 'ORD','SAN', 'SFO', 'SJC', 'STR', 'ZRH', . . . )

Airport_code CHARACTER(3) NOT NULL CONSTRAINT APC

CHECK( Airport_code IN ( 'ATL', 'CDG', 'DFW', 'FCO', 'FRA',

),

). . .

Values of abstract data type

CREATE TABLE AIRCRAFT_TYPE(

. . .Number_of_Engines INTEGER NOT NULL

CONSTRAINT NO_ENGINESCHECK( Number_of_Engines BETWEEN 0 AND 4 ),

). . .

Domain for data element



Student Notebook

Figure 7-29. Triggers CF182.0

Notes:

A trigger defines a set of actions to be performed when a specific event occurs. Triggers are defined for tables. The execution of the actions for the trigger can be triggered by insert, update, or delete operations on the table for the trigger.

Triggers can be used to cause updates to other tables; automatically generate or transform values for inserted or updated rows; or invoke functions to perform tasks such as issuing alerts.

Triggers are a useful mechanism to define and enforce transitional business rules, i.e., rules involving different states of the data. Using triggers places the logic to enforce the business rules in the database and relieves the business processes using the tables from having to enforce it. Centralized logic means easier maintenance since no program changes are required when the logic changes.

The following items must be considered when defining a trigger:

Triggers

A trigger is a set of actions to be performed when a specific event

occursTrigger

Activation Time

Before change applied

After change applied

Triggered actions applied conditionally

Determined by search condition

WHEN clause

Prerequisite Conditions

Fullselect

SIGNAL SQLSTATE

SET transition-variable

Before Triggers

A set of SQL statements

After Triggers

DELETE

UPDATE

SIGNAL SQLSTATE

Fullselect

INSERT

Triggered Actions

Triggering Operations

INSERT

DELETE

UPDATE

Any columns

Selected columns

Granularity

For each row processed

Once for SQL statement




Uempty
Triggering Operations
Triggers are defined for tables. When defining a trigger, you must specify the operation to which the trigger applies, i.e., which will cause the actions for the trigger to be executed. The operation can be:

• an insert operation (INSERT) • a delete operation (DELETE) • an update operation (UPDATE)

For update operations, the trigger can apply to the updating of selected columns or the updating of arbitrary columns of the table.

Granularity

The trigger can be executed for each row inserted, updated, or deleted or once for the INSERT, UPDATE, or DELETE statement. This is referred to as the granularity of the trigger.

Activation Time

Triggers can be executed before the changes of the triggering operation are applied or after they have been applied. Depending on the time when they are applied, triggers are classified as before triggers or after triggers.

Before triggers can be used to set or change the values for insert or update operations. An after trigger can be used, for example, to reflect changes to the table for the trigger in another table. For example, as rows are added to or deleted from the table for the trigger, a row count in another table can be increased or decreased.

Prerequisite Conditions

The execution of the actions for a trigger can be made conditional: The actions are only performed if a specified prerequisite condition is met. The prerequisite condition, a search condition, is specified by means of a WHEN clause. The actions for the trigger are only executed if the search condition evaluates to true.

Triggered Actions

The actions for a trigger consist of one or more SQL statements. They are only executed if the search condition for the trigger evaluates to true. The SQL statements that can be part of the actions depend on the type of trigger.

If the trigger is a before trigger, the actions can generally include fullselects, signal SQL states, or set transition variables. Transition variables allow you to refer to values of the rows affected by the trigger.

If the trigger is an after trigger, the triggered actions can generally include fullselects, INSERT, DELETE, or UPDATE statements, or signal SQL states.



Student Notebook

Figure 7-30. Triggers - Some Additional Remarks CF182.0

Notes:

Triggers can reference the values of the affected rows. They can refer to the values before the execution (update or delete operations) and/or after the execution (update or insert operations) of the triggering SQL operation. The appropriate version of the data (OLD or NEW) can be identified by means of the REFERENCING clause when defining the trigger.

As mentioned for the previous visual, before triggers can change the values of columns of the affected rows. They can do this by setting transition variables via the SET transition- variable SQL statement. Transition variables use the names of the columns, qualified by a correlation name assigned to the version of the data via the REFERENCING clause. The SET transition-variable SQL statement is also referred to as SET assignment SQL statement.

In contrast to check expressions which, most of the time, are more restrictive, triggers can generally use built-in functions and user defined functions. The functions can be used by the search condition of the WHEN clause as well as by the triggered actions.

Triggers - Some Additional Remarks

Triggers not effective during loading of rows

Multiple triggers can be defined for same event

Trigger created first, fires first

Triggers can refer to values before (UPDATE, DELETE) and after (UPDATE, INSERT) the execution of the triggering SQL operation

Must use REFERENCING clause to identify version (OLD or NEW)

Triggers may change data before it is stored

By means of SET transition-variable SQL statement

After triggers can cause other triggers to fire

Triggers for tables used by triggered actions

Triggers can use built-in functions and user defined functions

Not all target database management systems support triggers

Some (minor) restrictions may apply




Uempty
The actions of after triggers can cause other triggers to fire, namely, triggers for the tables maintained by the triggered actions. Since INSERT, UPDATE, and DELETE statements are not permitted for before triggers, they cannot cause other triggers to fire.
Multiple triggers can be defined for the same table. You can even define multiple triggers for the same event. If multiple triggers are defined for the same event, the trigger created first fires first.

There is one drawback associated with triggers: triggers are not effective during the loading of data.

Not all of the target database management systems support triggers. The various database management systems supporting triggers may have restrictions. However, in general, the restrictions are minor and less severe than the restrictions for check constraints.



Student Notebook

Figure 7-31. A Sample Abstract Data Type - Name Data CF182.0

Notes:

Now, we want to illustrate the implementation of a sample abstract data type. We have chosen abstract data type Name Data described in Unit 5 - Data and Process Inventories. Its description is repeated on the visual. Its values consist of strings of letters, blanks, and single dashes (-) or periods (.).

There are two operations defined for the abstract data type. The Normalization operation (NORM) removes all leading and training blanks from a name-data string; reduces intermediate groups of blanks to a single blank each; and uppercases all letters. In other words, it produces a normalized version of the string.

The Equal Comparison operation (EQUAL) defines when two name-data strings are considered equal. They are considered equal if their normalized versions are the same.

As you can see from the signature of the data type, it is parameterized. For a data element using it, the minimum length and the maximum length of the accepted strings can be specified.

In the database , we want to store all data in the normalized format .

A Sample Abstract Data Type - Name Data

Any string of letters, blanks, and single dashes (-) or periods (.). Minimum-length and maximum-length specify how many characters the string has at least (default: 1) and at most (default: unlimited).

Values:

Operations:

EQUAL(name-data-1, name-data-2) { TRUE | FALSE }

Normalizes name-data-1 and name-data-2 and compares them character by character


Equal Comparison

Normalize Name Data

NORM(name-data-1) name-data-2

Removes all leading and trailing blanks from name-data-1

Reduces intermediate groups of blanks for name-data-1 to a single blank each

Uppercases all letters of name-data-1

In the database, the data are to be stored normalized

NAMEDATA( [ minimum-length ] [ , maximum-length ] )Signature:




Uempty

Figure 7-32. Setting Up the Abstract Data Type CF182.0

Notes:

The approach chosen for the implementation of the abstract data type uses a user defined distinct type for the abstract data type because we want to discuss some related problems. It prevents the comparison of character strings that are not name data with name data. It would be possible to implement the abstract data type without a user defined distinct type which has some advantages, but also some disadvantages.

First, we define a user defined distinct type called NAMEDATA consisting of varying-length character strings. When defining the user defined distinct type, you must provide a maximum length for the source data type. Since the abstract data type is parameterized, we need to specify the maximum length that any columns using it may have. However, you must choose the maximum length carefully to ensure that the rows for the tables will fit into the pages for the tables. The system will enforce this when the tables are created. Thus, the lengths of the candidate columns should not vary too much.

In the example, we have restricted the maximum length of NAMEDATA columns to 100 characters. The columns using the data type may use smaller maximum lengths. We must

Setting Up the Abstract Data Type

CREATE DISTINCT TYPE NAMEDATAAS VARCHAR(100)WITH COMPARISONS

Absolute maximum length allowed for columns using the data type

Length ranges for columns will be limited by other means

Columns will use smaller maximum lengths

CREATE FUNCTION

RETURNS NAMEDATAEXTERNAL NAME 'program'LANGUAGE programming-language. . .

NORM(NAMEDATA)Checks name data string for valid name data

Normalization function

Returns nonzero SQL state if not valid name data

Returns zero SQL state and normalized name data string otherwise

CREATE FUNCTION

RETURNS INTEGERSOURCE

SYSIBM.LENGTH(VARCHAR())

LENGTH(NAMEDATA)Extends LENGTH built-in function to name data

Required for enforcing length ranges for columns



Student Notebook

enforce the length ranges for the columns by other means. We will see later on how this can be achieved.

Next, we define a user defined function, called NORM, corresponding to the Normalization function. However, it is not quite the Normalization function since it performs some additional validity checking. The function is an external scalar function accepting strings of user defined distinct type NAMEDATA. It checks if the input string has a valid name-data format, i.e., only contains letters (small or capital), blanks, and single dashes or periods. If the string is invalid, a nonzero SQL state is returned by the function.

If the input string is valid, the function returns a zero SQL state and converts the input string to its normalized name-data format. The data type for the output is NAMEDATA, the user defined distinct type. Note that user defined functions must return an SQL state in addition to the output described in their definition. The SQL state is checked by the target database management system to determine if to continue or terminate the operation being performed.

The function does not immediately accept variable-length character strings that are not of type NAMEDATA. If you want to use it to convert other character strings to normalized name data, you must first apply the system-provided cast function for user defined distinct type NAMEDATA:

NORM(NAMEDATA(character-string))

On page 7-55, we defined a user defined function NORM whose only input parameter was of type TEXTDATA, another user defined distinct type. Note that both user defined functions may exist at the same time because their signatures are different.

Since the enforcement of the length ranges for the columns needs to determine the length of input data, we must extend the LENGTH built-in function to user defined distinct type NAMEDATA. This is done by the second user defined function on the visual, a sourced scalar function.




Uempty

Figure 7-33. INSERT Triggers for Abstract Data Type CF182.0

Notes:

By means of the user defined functions on the previous visual, we can enforce that:

• The data of the column is always valid, i.e., only contains characters and character sequences permitted for the abstract data type.

• The data of the column is stored in normalized format: leading and trailing blanks are removed, intermediate blanks are reduced to a single blank each, and alphabetical characters are in upper case.

• The minimum length and the maximum length for the column are observed.

For insert operations, this can be achieved by the two triggers on this visual. The triggers are defined for each table containing name-data columns. The table must be created before the triggers for the table can be defined.

Symbolic variables (in italics) are used in the CREATE TRIGGER statements on the visual. If you want to create the triggers, you must replace them by the actually applicable values. Table-name, column-name, minimum-length, and maximum-length must be replaced by the

INSERT Triggers for Abstract Data Type

1CREATE TRIGGER INSNAME1

INSERT ON table-nameREFERENCING NEW AS NFOR EACH ROW

NO CASCADE BEFORE

MODE DB2SQLBEGIN ATOMIC

SET N.column-name = NORM(N.column-name);END

Checks for correct data and normalizes input string

2

CREATE TRIGGER INSNAME2

INSERT ON table-nameREFERENCING NEW AS NFOR EACH ROW

NO CASCADE BEFORE

MODE DB2SQL

BEGIN ATOMICSIGNAL SQLSTATE '72001' ('INVALID COLUMN LENGTH');

END

WHEN ( LENGTH(N.column-name) NOT BETWEEN minimum-length AND maximum-length )

Checks for correct column length and sets SQL state



Student Notebook

name of the table, the name of the column, the minimum length for the column, and the maximum length, respectively.

Both triggers are activated for each row to be inserted. They are activated before the row is inserted. The first trigger (INSNAME1) uses user defined function NORM, in a SET transition-variable SQL statement, to verify the correctness of the input string and to normalize it. You need the correlation name defined via the REFERENCING clause on both sides of the equal sign. On the right-hand side, you need it for the user defined function to refer to the entered value for the new row. On the left-hand side, you need it because you are changing the column value for the new row.

If the user defined function returns a zero SQL state, the value of the column for the row becomes the normalized string and this will be the value inserted. If the user defined function returns a nonzero SQL state, the SET transition-variable SQL statement fails and the INSERT statement fails.

Note that the input string for the column is of type NAMEDATA when the trigger receives it. A character string entered as input for the column in the INSERT statement is converted to type NAMEDATA by the cast function for the user defined distinct type.

The second trigger (INSNAME2) ensures that the length range for the column is enforced during insert operations. The WHEN clause checks if the length of the column is outside the range defined by minimum-length and maximum-length. If it is outside, the WHEN condition evaluates to true and a nonzero SQL state is signaled by means of the SIGNAL SQLSTATE SQL statement. The nonzero SQL state causes the INSERT statement to terminate.

The sequence in which the triggers are defined is relevant. The triggers must be created in the sequence on the visual. As a consequence, the length check is performed for the normalized string (which may be shorter) and not for the original input string.

You may ask if it were not possible to use a check constraint for the column instead of the second trigger? The restrictions for check constraints are generally more severe than those for triggers and your database management system may not allow you to use an equivalent check constraint. For example, Version 6 of DB2 Universal Database for z/OS does not allow you to use built-in functions or user defined functions in check expressions. In addition, the result would not be quite the same. A check expression would verify the length of the unnormalized string whereas the trigger verifies the length of the normalized string.

If you have multiple name-data columns for a table, you need not have two triggers for each column. In the first trigger, you can use multiple SET transition-variable SQL statements as triggered actions to check and normalize all columns. In the second trigger, you can combine the length checks for all columns by logical ORs.




Uempty

Figure 7-34. UPDATE Triggers for Abstract Data Type CF182.0

Notes:

This visual illustrates the triggers needed for update operations to ensure the correctness of the new column values; to ensure the observance of length constraints for the column; and to store the new column values in normalized format.

Both triggers are activated for each row before the row is updated. They are only activated if the appropriate column is updated (UPDATE OF column-name). Otherwise, the same remarks apply as to the triggers for insert operations.

CREATE TRIGGER UPDNAME1

UPDATE OF column-name ON table-nameREFERENCING NEW AS NFOR EACH ROW

NO CASCADE BEFORE

MODE DB2SQLBEGIN ATOMIC

SET N.column-name = NORM(N.column-name);END

1 Checks for correct data and normalizes input string

CREATE TRIGGER UPDNAME2

UPDATE OF column-name ON table-nameREFERENCING NEW AS NFOR EACH ROW

NO CASCADE BEFORE

MODE DB2SQL


END

WHEN ( LENGTH(N.column-name) NOT BETWEEN minimum-length AND maximum-length )

2Checks for correct column length and sets SQL state

UPDATE Triggers for Abstract Data Type



Student Notebook

Figure 7-35. Abstract Data Type - Inserting and Updating CF182.0

Notes:

The above visual illustrates the flow of control and the conversions of input during insert and update operations.

If a character string is assigned to a field defined as NAMEDATA, the system-provided cast function for the user defined distinct type is automatically invoked. Thus, you need not invoke it yourself. It casts the character string to user defined distinct type NAMEDATA, the data type of the input parameter for user defined function NORM.

Next, the first (insert or update) trigger is activated which uses user defined function NORM to normalize the value and assigns the normalized value to the column for the row. If the value passes the length checks of the second trigger, the row is stored with the normalized value for the column.

In the first example, an insert request, string 'wright bros.' is converted to 'WRIGHT BROS.' which then is assigned to the column and stored since it passes the length checks.

Abstract Data Type - Inserting and Updating

Column defined as NAMEDATA

Casting since assignments of values

UPDATE table-name

SET column-name = 'wright bros..'

Nonzero SQL State

'WRIGHT BROS.'

Length checks by second

trigger

System-provided cast function for

NAMEDATA

User defined function NORM in first trigger

INSERT INTO table-name ( . . . , column-name , . . . )

VALUES( . . . , 'wright bros.' , . . . )




Uempty
The second example on the visual illustrates, for an update request, what happens if the new value for a column, defined as NAMEDATA, is invalid. User defined function NORM, called by the SET transition-variable SQL statement of the triggered action, determines that the input string does not have a valid name-data format (two successive periods). It returns a nonzero SQL state which is passed on by the trigger and causes the update request to fail.


Student Notebook

Figure 7-36. Abstract Data Type - Selecting Data CF182.0

Notes:

To retrieve specific rows based on a search condition for a column defined as NAMEDATA, you must use both the system-provided cast function and user defined function NORM.

As described for user defined distinct types, you cannot directly compare values of a column of a user defined distinct type with values of the source type. Accordingly, you cannot directly compare the values of a column of user defined distinct type NAMEDATA with a character string. You must first convert the character string to user defined distinct type NAMEDATA. Furthermore, the input string should be normalized to ensure that the corresponding rows are found in the table independent of the way they have been entered.

Both is achieved by first applying system-provided cast function NAMEDATA to the string and then user defined function NORM:

NORM(NAMEDATA(string))

System-provided cast function NAMEDATA casts the string to user defined distinct type NAMEDATA. Only then, user defined function NORM can be applied since its input must be of type NAMEDATA. You cannot apply user defined function NORM directly to the

Abstract Data Type - Selecting Data

SELECT . . .

WHERE column-name = NORM ( NAMEDATA ( string ) )FROM table-name

Casts input string to NAMEDATA and allows it to be input for NORM function

Normalizes string and allows it to be compared with values in column




Uempty
character string. Since the output of NORM is of type NAMEDATA, it can be compared with the values of the column.
To avoid the invocation of two functions, you could define an additional user defined function whose input parameter is of type VARCHAR(); whose output is of type NAMEDATA; and which normalizes the input string.

Note that you will receive a operands-not-comparable SQL code when immediately comparing the character string with the values of the columns.



Student Notebook

Figure 7-37. An Alternate Implementation (1 of 2) CF182.0

Notes:

This visual and the next illustrate an alternate implementation for abstract data type NAMEDATA. The implementation does not use a user defined distinct type. The name-data columns for a table are defined with built-in data type VARCHAR(). As length of the column, the actual maximum length for the column is chosen.

As before, we need a user defined function checking the correctness of input for the columns and normalizing the input strings. This time, we call the function NAMEDATA. The data type for its only parameter as well as for its output is VARCHAR(). As length, we use the maximum length for any anticipated name-data column. This allows us to use the same function for all columns.

Because we do not use a user defined distinct type, we need not define a sourced user defined function LENGTH. For determining the length of strings, we can use the LENGTH built-in function.

An Alternate Implementation (1 of 2)

No user defined distinct type

Name-data columns defined as VARCHAR() with actual maximum lengths for columns

User defined function NAMEDATA verifies correctness of input and normalizes it. Input and output are VARCHAR()

CREATE FUNCTION

RETURNS VARCHAR(100)EXTERNAL NAME 'program'LANGUAGE programming-language. . .

NAMEDATA(VARCHAR(100))

No sourced function LENGTH needed. Built-in function LENGTH can be used since a user distinct type is not used

Maximum length of any name-data

columns intended




Uempty

Figure 7-38. An Alternate Implementation (2 of 2) CF182.0

Notes:

The triggers needed are basically the same as for the other solution. The only differences are:

• User defined function NAMEDATA is used instead of user defined function NORM.

• The second trigger only needs to check the minimum length. The maximum length is enforced by the column length.

Again, you need two triggers for insert and update operations each.

On SELECT statements, you use the NAMEDATA function to normalize the search string.

An Alternate Implementation (2 of 2)

Triggers use function NAMEDATA and need only check for minimum length

CREATE TRIGGER INSNAME1NO CASCADE BEFORE INSERT ON table-nameREFERENCING NEW AS NFOR EACH ROW MODE DB2SQLBEGIN ATOMIC

SET N.column-name = NAMEDATA(N.column-name);END

1 Checks for correct data and normalizes input string

CREATE TRIGGER INSNAME2NO CASCADE BEFORE INSERT ON table-nameREFERENCING NEW AS NFOR EACH ROW MODE DB2SQL


END

WHEN ( LENGTH(N.column-name) < minimum-length )2 Checks for correct column length and sets SQL state

Similar triggers for UPDATE

On SELECT, use user defined function NAMEDATA to normalize input

SELECT . . .

WHERE column-name = NAMEDATA ( string )FROM table-name



Student Notebook

Figure 7-39. Token Translation Tables CF182.0

Notes:

Frequently, the columns of tables contain a well-defined, previously known small set of values. For table SEAT on the top of the visual, this is the case for columns SEAT_LOCATION, SEAT_CLASS, and SECTION.

To save space, frequently, smaller tokens (frequently numbers) are stored in the table instead of the lengthy actual values. Descriptions for the tokens are kept in separate tables as illustrated in the lower part of the visual. The descriptive tables are referred to as token translation tables.

To display the rows of the main table with the actual values and not with the tokens, you need Join operations to fill in the actual values. Even though the token translation tables are small compared to the main table and their rows will probably be in the buffers of the database management system, the Join operations may create a performance problem. In addition, the Join operations will complicate the retrieval of the rows. Furthermore, the number of tables that can be joined is generally limited.

The use of token translation tables is certainly not recommendable if compression is used since the savings in this case do not warrant the performance degradation and effort.

Token Translation Tables

Aircraft_Number

Seat_Number

Seat_Location

Seat_Class

Section

B474001323 1A WINDOW FIRST N/SMOKING

B474001323 1B MIDDLE FIRST N/SMOKING

B474001323 1C AISLE FIRST N/SMOKING

. . . . . . . . . . . . . . .B474001323 46J WINDOW ECONOMY SMOKING

B171004217 1A WINDOW BUSINESS N/SMOKING

B171004217 1B AISLE BUSINESS N/SMOKING

. . . . . . . . . . . . . . .B171004217 28G WINDOW ECONOMY N/SMOKING

SEAT

Aircraft_Number

Seat_Number

Seat_Location

Seat_Class

Section

B474001323 1A 1 1 N

B474001323 1B 2 1 N

B474001323 1C 3 1 N

. . . . . . . . . . . . . . .B474001323 46J 1 3 S

B171004217 1A 1 2 N

B171004217 1B 3 2 N

. . . . . . . . . . . . . . .B171004217 28G 1 3 N

SEAT

Seat_Location

Text

1 WINDOW

2 MIDDLE

3 AISLE

SEAT LOCATION

Seat_Class

Text

1 FIRST

2 BUSINESS

3 ECONOMY

SEAT CLASS

Section Text

N N/SMOKING

S SMOKING

SECTION

Requires Join operations to display actual values

May create a performance problem

Not recommendable if system supports compression

To save space, frequently, tokens are stored instead of actual values

Descriptions for tokens are kept in separate tables

Token translation tables




Uempty

Figure 7-40. Token Translation Tables - An Alternative CF182.0

Notes:

Instead of token translation tables, you can use check constraints in conjunction with CASE expressions to achieve the same space savings without the problems of Joins.

For a column concerned, you can provide a check expression using the IN predicate to list all allowed tokens. Using a check expression ensures that only correct values are in the columns.

On retrieval, you use a CASE expression when selecting the column. The CASE expression allows you to translate the tokens into the actual values that should be returned.

There is one disadvantage with this method you should be aware of: When new values are added, you must change the SELECT statements. If they are contained in views, you must drop the views. The consequence is that authorizations for the views are lost and must be reestablished. There is not a problem with the check constraints because they can be deleted and added again without impact.

Token Translation Tables - An Alternative

SELECT . . . , CASE Seat_ClassWHEN '1' THEN 'FIRST'WHEN '2' THEN 'BUSINESS'WHEN '3' THEN 'ECONOMY'

END AS Seat_Class, . . . FROM SEAT

CREATE TABLE SEAT(

. . .Seat_Class CHARACTER(1) NOT NULL

CONSTRAINT CLASSCHECK( Seat_Class IN ( '1', '2', '3' ) ),

). . .

Ensures correctness of values

Makes actual values available



Student Notebook




Uempty
7.3 Documentation


Student Notebook

Figure 7-41. Documenting User Defined Distinct Types CF182.0

Notes:

For user defined distinct types, you just need to provide their name and source data type and a description for which types of data (columns) they should be used. For the source data type, the (maximum) length, the number of digits, and/or the number of decimal places must be provided in accordance with the requirements for the source data type.

To use a user defined distinct type with a varying-length source data type for multiple columns, you must specify the maximum length of any columns using it.

Documenting User Defined Distinct Types

For each user defined distinct type:

Name: A unique name for the distinct type in compliance with the naming requirements of the target DBMS

Source Type: Built-in data type on which the user defined distinct type is based including any lengths and decimal places

For fixed-length string data types, the length of the strings

For decimal data types, number of digits and number of decimal places

For varying-length string data types, the maximum length for the distinct data type without considerations for columns

A description for which type of data (columns) the distinct type should be used.

Description:




Uempty

Figure 7-42. Documenting User Defined Functions (1 of 2) CF182.0

Notes:

The documentation for a user defined function includes:

• Name, signature, and output returned by the user defined function.

• The category of the user defined function (scalar function, column function, or table function).

• The type of the user defined function (external or sourced).

• A textual description of the user defined function.

• For an external function, name, location, and programming language for the object program used by the user defined function.

• For a sourced user defined function, the built-in or user defined function on which the user defined function is sourced including the appropriate parameters.

The items are described on this visual and the next. For the name, signature, and the output returned, all relevant information is contained on the current visual.

Documenting User Defined Functions (1 of 2)

For each user defined function:

Name: The name for the user defined function in compliance with the naming requirements of the target DBMS

Signature: Signature of function in the form

For each input parameter, specify its built-in or user-defined data type including length, number of digits, and/or number of decimal places

name ( parameter-1, parameter-2, ...)

Output Returned:

For scalar or column functions:

A textual description of the output

The built-in or user-defined data type returned including length, number of digits, and/or decimal places

For each column returned by a table function:Its column name

The built-in or user-defined data type for the column including length, number of digits, and/or decimal places

A textual description of the column



Student Notebook

Figure 7-43. Documenting User Defined Functions (2 of 2) CF182.0

Notes:

The textual description should outline in detail what the function does. This is especially important for external functions. The description should include any SQL states returned and their meaning.

For the program source, the name and library for the object program (load module or DLL), should be provided if they are already known. The object program is invoked by the user defined function, not the source program. (Note that the title of the item is Program Source and not Source Program.) At the time the function is documented, some of the information for this item may not yet be available. However, you can already select a name for the object program. The missing information must be provided later.

Documenting User Defined Functions (2 of 2)

Category for function: scalar function, column function, or table function

Category:

Type: Type of function: external function or sourced function

Textual description of functionDescription:

Built-in or user defined function the current function is based upon including data types of parameters for the function (sourced functions only)

Source Function:

Name of object program supporting the function and name of library containing it (external functions only)

Program Source:

Programming Language:

Programming language of program supporting function (external functions only)




Uempty

Figure 7-44. Documenting Check Constraints CF182.0

Notes:

For check constraints, the following items need be documented:

• The name of the table to which the constraint applies.

• If the constraint applies to a particular column, the name of the column to which it applies.

This item is not applicable to check constraints defined on the table level.

• Although the database management systems do not generally force you to specify a name for a check constraint, you should give a name to each check constraint. This eases the maintenance of check constraints.

The names for check constraints need only be unique for each table. Nevertheless, it is recommended that you use unique names for all check constraints of your application domain.

• A detailed textual description outlining what the check constraint achieves.

Documenting Check Constraints

Name of table to which constraint appliesTable:

Column: If constraint applies to a particular column, name of column to which constraint applies

Name for constraint (unique for table)Constraint Name:

Textual description of condition to be checked by constraint

Description:

Check Condition:

Search condition for condition to be checked by constraint

For each check constraint:



Student Notebook

• The search condition for the check constraint. When specifying the search condition for the check constraint, verify with your database administrator that it can be implemented, i.e., only uses functions supported by your database management system.




Uempty

Figure 7-45. Documenting Tables - Table Info (1 of 2) CF182.0

Notes:

The information to be documented for a table can be subdivided into table-related information and column-related information. The current visual and the next describe the table-related information to be documented.

The long table name can be stored into the system tables for the target database management system by means of the LABEL ON TABLE SQL statement if that is supported by the target database management system. The description can be stored by means of the COMMENT ON TABLE SQL statement if that is supported by the target database management system.

If the primary key consists of multiple columns, it is important to establish and specify the logical sequence of the columns within the primary key. This will become relevant when talking about foreign keys in a later unit.

Under the heading Check Constraints, only constraints should be listed that are not column specific. The column-specific check constraints are listed for the columns.

Documenting Tables - Table Info (1 of 2)

For each table:

The name of the table in the target DBMS. The maximum length for table names depends on the target DBMS

Table Name:

Optional. An additional long name for the table referred to as label. Can be stored in system tables. Cannot be used in SQL statements

Table Long Name:

Optional. A textual description for the table referred to as comment. Can be stored in system tables

Description:

Primary Key: Names and sequence of columns belonging to primary key for table

Names of check constraints for table rather than for individual columns

Check Constraints:



Student Notebook

Figure 7-46. Documenting Tables - Table Info (2 of 2) CF182.0

Notes:

The items on this visual represent information the database administrator needs to know for the assignment of primary and secondary allocation units and for scheduling reorganizations.

Length changes for rows during updates may cause rows for tables to relocated. This may lead to indirect accesses decreasing performance.

Documenting Tables - Table Info (2 of 2)

Expected number of inserts during a time interval (e.g., a month)

Inserts/Time Interval:

Insert Pattern: Distribution of inserts over primary key values (e.g., equally distributed or ever increasing key values)

Updates/Time Interval:

Expected number of updates during a time interval (e.g., a week)

Length Changes:

Percentage of updates causing length changes of rows

Expected number of deletions during a time interval (e.g., a month)

Deletes/Time Interval:

Distribution of deletions over primary key values (e.g., equally distributed or lowest key values)

Delete Pattern:

Number of rows initially in tableNumber of Rows:




Uempty

Figure 7-47. Documenting Tables - Column Information CF182.0

Notes:

The long column name can be stored into the system tables for the target database management system by means of the LABEL ON [COLUMN] SQL statement if supported by the target database management system. The description can be stored by means of the COMMENT ON [COLUMN] SQL statement if that is supported by the target database management system.

Under the heading Check Constraints, constraints just involving the column are to be listed and not table-level constraints, i.e., constraints that involve multiple columns.

Documenting Tables - Column Information

For each column of a table:

The name of the column in the target DBMS. Maximum length for column name depends on target DBMS

Column Name:

Names of check constraints for column (column-specific check constraints only)

Check Constraints:

Description: Optional. A textual description for the column referred to as comment. Can be stored in system tables

Column Long Name:

Optional. An additional long name for the column referred to as label. Can be stored in system tables. Cannot be used in SQL statements

Data Type: Built-in or user-defined data type for column including length, number of digits, and/or number of decimal places

Column Attributes:

Additional attributes for column such as nullable, NOT NULL, WITH DEFAULT, and default values



Student Notebook

Figure 7-48. Documenting Triggers CF182.0

Notes:

For each trigger, all the items we discussed in detail should be documented. Verify with your database administrator that the search condition for the WHEN clause of the trigger can be implemented, i.e., only uses functions supported by your database management system. Also verify that the intended actions are supported by the target database management system.

Documenting Triggers

For each trigger:

Triggering Operation:

Operation to which trigger applies (INSERT, DELETE, UPDATE, or UPDATE OF column)

Definition Sequence:

If multiple triggers for same event, a number determining the sequence in which triggers must be created

Granularity: If trigger applies to each row or to SQL statement

When trigger is applied: BEFORE operation or AFTER operation

Time Applied:

Search condition that must be TRUE for trigger to firePrerequisite Conditions:

Name of table to which the trigger appliesTable:

Name for triggerName:

Actions to be performed when trigger fires (SQL statements)

Triggered Actions:




Uempty
Checkpoint

1. How are tuple types translated into tables?

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

2. When can tuple types be merged?

_____________________________________________________

_____________________________________________________

_____________________________________________________

3. When can you imbed a tuple type into another tuple type?

_____________________________________________________

_____________________________________________________

_____________________________________________________

4. The tuple types for 1:1 or 1:m relationship types can always be merged or imbedded. (T/F)

5. Assume that T and T1 through Tn are tuple types satisfying the following conditions:

• They all have the same primary key.

• At all times, each primary key value of T1 through Tn occurs in T.

Which further condition must be satisfied for T1 through Tn being a perfect decomposition of T?

_____________________________________________________

_____________________________________________________

_____________________________________________________



Student Notebook

6. Give two reasons why you may not want to combine two tuple types that theoretically could be combined.

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

7. Name three limitations typically existing for relational database management systems.

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

8. The fixed-length size of pages may cause space to be wasted. (T/F)

9. Denormalization causes the redundant storage of information. (T/F)

10.Denormalization consciously violates the First Normal Form. (T/F)

11. Vertical splitting moves some attributes of a tuple type to another tuple type with the same primary key. (T/F)

12.Horizontal splitting of a tuple type always creates tuple types for different primary key ranges of the original tuple type. (T/F)




Uempty
13.Match the following categories with the listed built-in data types:
14.For varying-length character strings, a value of NULL has the same meaning as a string of length 0. (T/F)

15.Whether or not a column must always assume a value is specified by means of the keywords NOT NULL and NULL, respectively. (T/F)

16.Describe the difference between system default values and user default values.

_____________________________________________________

_____________________________________________________

_____________________________________________________

17.How can you provide your own default value for a column.

_____________________________________________________

_____________________________________________________

_____________________________________________________

a. Binary integers ____ VARCHAR

b. Decimal numbers ____ INTEGER

c. Floating-point numbers ____ REAL

d. Binary strings ____ DECIMAL

e. Datetime data ____ BIGINT

f. Single-byte character strings ____ BLOB

g. Double-byte character strings ____ DATE

____ SMALLINT

____ CHARACTER

____ DOUBLE

____ GRAPHIC

____ CLOB

____ NUMERIC

____ TIMESTAMP



Student Notebook

18.User defined distinct types must be based on built-in data types. They cannot be based on other user defined distinct types. (T/F)

19.When using a user defined distinct type that is based on VARCHAR for a specific column, you can specify a maximum length for the column that is different from the length specified for the user defined distinct type. (T/F)

20.Describe the difference between external and sourced user defined functions.

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

21.A primary purpose of sourced user defined functions is to promote existing functions to new user defined distinct types. (T/F)

22.Establish the proper relationships:

23.Describe the major purpose of check constraints.

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

24.Check constraints defined on the column level may refer to other columns of the table. (T/F)

a. Scalar functions can be ____ Sourced functions only

b. Column functions can be ____ External functions only

c. Table functions can be ____ External or sourced functions




Uempty
25.What is a trigger?
_____________________________________________________

_____________________________________________________

26.Triggers can be activated by SELECT, UPDATE, INSERT, or DELETE statements. (T/F)

27.A trigger can be executed for each row processed or once for the triggering SQL statement. (T/F)

28.When can triggers be activated?

_____________________________________________________

_____________________________________________________

_____________________________________________________

29.Which of the following SQL statements are allowed for before triggers?

a. Fullselects.

b. INSERT statements.

c. UPDATE statements.

d. DELETE statements.

e. SIGNAL SQLSTATE statements.

f. SET transition-variable statements.

30.Which of the following SQL statements are allowed for after triggers?

a. Fullselects.

b. INSERT statements.

c. UPDATE statements.

d. DELETE statements.

e. SIGNAL SQLSTATE statements.

f. SET transition-variable statements.



Student Notebook

31.Trigger can change the values of columns before they are stored. (T/F)

32.Generally, triggers can use user defined functions. (T/F)




Uempty


Notes:


Tuple type for a supertype with an exclusive and covering subtype set can be eliminated (perfect decomposition)

Tuple types with always corresponding primary key values can be merged

Tuple types whose primary key values are always a subset of the primary key values of another tuple type can be imbedded in the other tuple type if:

For each potential tuple, at least one nonkey attribute has a value

Tuple types for 1:1 and 1:m relationship types can always be merged or imbedded

(Re)introduces problems for not normalized tuple types

If necessary for performance reasons, tuple types can be denormalized

For performance reasons or because of limitations for the target DBMS, you may want to split tuple types vertically or horizontally

You do not always want to combine tuple types

If other tuple types referentially dependent on tuple type to be eliminated

If restrictions for database management system become effective

If tuple types have nothing to do with each other



Student Notebook


Notes:


Each tuple type becomes a table Each elementary attribute becomes a column

Each elementary primary key attribute becomes a primary key column

Tuple types are converted into tables as follows:

Columns can assume system-provided or user-provided default values

Built-in data types include data types for:Numeric data: INTEGER, SMALLINT, BIGINT, DECIMAL, NUMERIC, REAL, DOUBLESingle-byte character strings: CHARACTER, VARCHAR, CLOB

Datetime data: DATE, TIME, TIMESTAMP

Double-byte character strings: GRAPHIC, VARGRAPHIC, DBCLOB

Binary strings: BLOB

Columns can be defined as nullable or NOT NULLNullable: Column need not assume a value for every row

NOT NULL: Column must assume a value for every row

User defined distinct types

User defined functionsCheck constraints and/or triggers

To implement an abstract data type, you need:




Uempty


Notes:


User defined distinct types are based on built-in data typesRestrict allowed operations to comparisons for data of user defined distinct type Prevent illegal comparisons between different data types

User defined functions can be scalar functions, column functions, or table functions

User defined functions allow you to provide your own functions

External functions: Use a program provided by you

Sourced functions: Extend existing built-in or user defined functions

User defined functions can be external or sourced functions

Check constraints allow you to restrict the values columns can assume

A trigger is a set of actions (SQL statements) to be performed when a specific event occurs

Triggers are activated:By INSERT, UPDATE, or DELETE statementsFor each row or once per statementBefore or after changes have been appliedOnly when a prerequisite condition is satisfied



Student Notebook




Uempty
Unit 8. Integrity Rules

This unit discusses the different types of integrity to be achieved for a good design. It presents methods for implementing the various types of integrity.



• Describe the different types of integrity to be enforced for a database.

• Explain the integrity rules for referential integrity.

• Establish the referential constraints for the tables of an application domain.

• Draw the referential structure for the tables of an application domain.

• Know how to ensure the integrity of redundant information.

• Implement business constraints.


Accountability:



© Copyright IBM Corp. 2000, 2002 Unit 8. Integrity Rules 8-1

Student Notebook


Notes:

This unit discusses the different types of integrity that must be enforced for a database. They are:

• Referential integrity • Domain integrity • Redundancy integrity • Constraint integrity

The unit will describe: the integrity rules that can be enforced to achieve referential integrity; how the referential constraints can be implemented; and how to establish the referential structure for an application domain. The referential structure provides a graphical overview of the referential constraints.

The unit will not discuss domain integrity in detail since it has been discussed by the previous unit. It will discuss how the integrity of redundant information can be achieved.

Furthermore, the unit will explain how constraint integrity can be enforced, i.e., how the business constraints for an application domain can be implemented.

Unit Objectives


Draw the referential structure for the tables of an application domain

Explain the integrity rules for referential integrity

Establish the referential constraints for the tables of an application domain

Know how to ensure the integrity of redundant information

Implement business constraints

Describe the different types of integrity to be enforced for a database




Uempty
8.1 Referential Integrity


Student Notebook

Figure 8-2. Integrity Rules in Design Process CF182.0

Notes:

This unit deals with the establishment of the integrity rules for the database being designed. Thus, we are in the third step of storage view.

Integrity Rules in Design Process

ConceptualView


Tables

Tuple Types

Integrity Rules


Indexes


Problem Statement





Uempty

Figure 8-3. Integrity - Areas of Concern and Types CF182.0

Notes:

As we have seen, multiple tables are using the same columns. For example, column Type_Code occurs in tables AIRCRAFT_TYPE and AIRCRAFT_MODEL. The values it can assume in table AIRCRAFT_MODEL are dependent on the values of the column in table AIRCRAFT_TYPE since they are references to rows in table AIRCRAFT_TYPE. Therefore, they must always be a subset of the current values of column Type_Code in AIRCRAFT_TYPE.

It is a concern of database design to ensure that references to other tables are always correct. The appropriate integrity is referred to as referential integrity.

A similar concern is the correctness of the values in the columns of the tables. A column must only assume values allowed by the abstract data type for the data element associated with the column. Furthermore, the values must be within the limits defined by the domain for the data element.

Column Type_Code mentioned above must only assume 3-letter codes for valid airports. Similarly, column Number_of_Engines for table AIRCRAFT_TYPE must only assume integer values between 0 and 4.

Integrity - Areas of Concern and Types

Correctness of references to other tables

Correctness of values and domains for columns

Consistency of redundant information

Observance of business constraints

Domain Integrity

Referential Integrity

Constraint Integrity

Redundancy Integrity



Student Notebook

The corresponding type of integrity is referred to as value integrity or, more commonly, domain integrity.

The third cause for concern is the redundant storage of information in the tables of the application domain. Redundancy can occur as the consequence of the repetitive storage of data or the storage of data that can be derived from other stored data (derivable data). If redundancy cannot be avoided (e.g., because of performance reasons), the redundant information on whose currency business processes are dependent must be consistent at all times.

The corresponding type of integrity is referred to as redundancy integrity.

The data in the tables are also not correct if they violate business constraints (business rules) for the application domain. This may be a rule as simple as that an employee cannot be a pilot and a mechanic at the same time. It may also be a more complex rule such as that a mechanic can only be assigned to the maintenance of an aircraft if he/she has been trained for the appropriate aircraft model.

The corresponding type of integrity is referred to as (business) constraint integrity.

The integrity of data can be jeopardized by maintenance operations, i.e., insert, delete, or update operations. Therefore, to guarantee the integrity of the data, rules must be established that govern and must be followed for these types of operations. The rules are referred to as integrity rules. In accordance with the type of operation to which they apply, the rules are referred to as Insert Rules, Delete Rules, and Update Rules, respectively.




Uempty

Figure 8-4. Referential Integrity - Terminology CF182.0

Notes:

In conjunction with referential integrity, some terms are used you need to be familiar with:

Key

A logically ordered set of columns of a table. The physical order of the columns in the table is not relevant. If the key consists of multiple columns, it is referred to as a composite key.

A (logically ordered) set of columns of a table that uniquely identifies the rows of the table. This need not be the primary key of the table. However, since we have established a primary key for every table and the primary key can be a parent key. This course will assume the parent key is a primary key.

Parent Key

On the visual, the parent key is the primary key of table AIRCRAFT_MODEL. It is a composite key. It consists of columns Type_Code and Model_Number. We will define that Type_Code is the first column and Model_Number the second column.

Referential Integrity - Terminology

AIRCRAFT_MODEL

Type_Code

Model_Number

Length_of_Model

A340 200 59.40

A310 300 46.67

B737 300 33.41

B747 400 70.67

AIRCRAFT

Aircraft_ Number

Date_Manufactured

Type_Code

Model_Number

B474001323 1994-10-12 B747 400

B373004518 1999-02-28 B737 300

B373004519 1999-03-31 B737 300

A103000534 1998-05-12 A310 300

A103003167 1997-08-01 A310 300

A402004217 1999-10-23 A340 200

Dependent Table

Parent Table

Parent/Primary

Key

Referential Constraint

Foreign KeyEngine_

NumberEngine_

TypeAircraft_Number

PW9880193 PW4062 B474001323

PW9880194 PW4062 B474001323

PW9880195 PW4062

PW9882345 PW4062 B474001323

PW9974034 PW4062 B474001323

R375184566 CF6-80C2 A103003167

R375184567 CF6-80C2

R375184568 CF6-80C2 A103003167ENGINE

Parent Table

Dependent Table



Student Notebook

Foreign Key

A key which relates to the parent key of another table or the same table and whose values must always be a subset of the values of the related parent key. Meaning and order of the parent-key and foreign-key columns must be the same. The names of the columns can be different. There is a one-to-one correspondence of the columns.

As mentioned before, we will always use the primary key of a table as parent key so that the foreign key relates to a primary key.

On the visual, columns Type_Code and Model_Number together, and in that order, are a foreign key of table AIRCRAFT referring to primary key (Type_Code, Model_Number) of table AIRCRAFT_MODEL.

Referential Constraint

The correlation existing between a foreign key and the corresponding parent key.

On the visual, the correlation between foreign key (Type_Code, Model_Number) of table AIRCRAFT and primary key (Type_Code, Model_Number) of table AIRCRAFT_MODEL represents a referential constraint.

The arrow illustrating a referential constraint in a diagram points from the parent key to the foreign key. A single-headed arrow is used if a parent key value can occur only once as foreign key value. A double-headed arrow is used if a parent key value can occur more than once as foreign key value.

Since each foreign key value can only occur once as parent key value, an arrowhead is not necessary for the inverse direction.

Parent Table

The table of a referential constraint that contains the parent key.

On the visual, AIRCRAFT_MODEL is the parent table for the referential constraint between foreign key (Type_Code, Model_Number) of table AIRCRAFT and the primary key of AIRCRAFT_MODEL.

Dependent Table

The table of a referential constraint that contains the foreign key.

On the visual, AIRCRAFT is the dependent table for the referential constraint between foreign key (Type_Code, Model_Number) of table AIRCRAFT and the primary key of AIRCRAFT_MODEL.




Uempty
Self-Referencing Constraint
A referential constraint whose parent key and foreign key belong to the same table. For a self-referencing constraint, the parent table and the dependent table are the same.

The referential constraint between columns Owning_Record (foreign key) and Maintenance_Number (parent key) of table MAINTENANCE_RECORD is a self-referencing constraint.

Self-Referencing Table

A table having a self-referencing constraint.

Table MAINTENANCE_RECORD for Come Aboard is a self-referencing table since it has the self-referencing constraint mentioned before.

Parent Row

A row of the parent table whose parent key value exists as foreign key value in the dependent table.

On the visual, all rows of AIRCRAFT_MODEL are parent rows for the referential constraint between foreign key (Type_Code, Model_Number) of table AIRCRAFT and the primary key of AIRCRAFT_MODEL.

Dependent Row

A row of the dependent table whose foreign key contains a value.

On the visual, all rows of AIRCRAFT are dependent rows for the referential constraint between foreign key (Type_Code, Model_Number) of table AIRCRAFT and the primary key of AIRCRAFT_MODEL.

Referential Integrity

For a referential constraint, referential integrity exists if, for every foreign key value of the dependent table, the appropriate parent key value exists in the parent table.

On the visual, referential integrity exists for the referential constraint between foreign key (Type_Code, Model_Number) of table tables AIRCRAFT and the primary key of AIRCRAFT_MODEL. The visual illustrates a second referential constraint: the referential constraint between column Aircraft_Number (foreign key) of table ENGINE and the primary key of table AIRCRAFT (parent key). For this referential constraint, AIRCRAFT is the parent table and ENGINE the dependent table.

The row for aircraft B373004518 is not a parent row since none of ENGINE's rows is dependent on it. The row for engine PW9880195 is not a dependent row since column Aircraft_Number does not contain a value for it.



Student Notebook

The referential integrity for a referential constraint must be controlled via insert, delete, and update rules for the parent table and the dependent table. For different referential constraints, the integrity rules may be different.

For each referential constraint, you can decide if you want the database management system to enforce the referential integrity or if you want to take care of it yourself. You may even decide not to care about referential integrity or to check it only periodically and correct problems when you find time.




Uempty

Figure 8-5. Referential Integrity - Insert Rules CF182.0

Notes:

For the insertion of rows, the following rules ensure the integrity of referential constraints:

• If the referential constraint is not a self-referencing constraint, rows can be added to the parent table at all times.

If the referential constraint is a self-referencing constraint, the parent table is the dependent table as well and the restrictions for dependent tables apply.

The insertion of rows into the parent table for a referential constraint may also be impaired by the parent table being the dependent table of another referential constraint.

• A row may be added to the dependent table for a referential constraint if:

- The foreign key for the row does not contain a value (if allowed for the columns of the foreign key).

- The foreign key value has a matching parent key value in the parent table.

In short: Insertion of a row only if foreign key does not have a value or matches an existing parent key value, i.e., an appropriate parent row exists.

Referential Integrity - Insert Rules

Only if parent row exists

Always

INSERT INTO AIRCRAFT( Aircraft_Number,

Model_Number, . . . )VALUES

( 'B373004863',

'300', . . . )'B737',

Type_Code,

INSERT INTO AIRCRAFT( Aircraft_Number,

Model_Number, . . . )VALUES

( 'A006003012',

'600', . . . )'A300',

Type_Code,

INSERT INTO AIRCRAFT_MODEL( Type_Code,

VALUES

'200', . . . )

Model_Number, . . . )

( 'B777',

Dependent Table

AIRCRAFT_MODEL

Type_Code

Model_Number

Length_of_Model

A340 200 59.40

A310 300 46.67

B737 300 33.41

B747 400 70.67

Parent Table

AIRCRAFT

Aircraft_ Number

Date_Manufactured

Type_Code

Model_Number

B474001323 1994-10-12 B747 400

B373004518 1999-02-28 B737 300

B373004519 1999-03-31 B737 300

A103000534 1998-05-12 A310 300

A103003167 1997-08-01 A310 300

A402004217 1999-10-23 A340 200



Student Notebook

For the referential constraint on the visual, rows can be added to table AIRCRAFT_MODEL without restrictions. The row for aircraft number B373004863 can be added to table AIRCRAFT since its foreign key value (B737, 300) has a matching parent row in table AIRCRAFT_MODEL.

The row for aircraft number A006003012 cannot be added to table AIRCRAFT because AIRCRAFT_MODEL does not contain a row for aircraft model (A300, 600).




Uempty

Figure 8-6. Referential Integrity - Delete Rules CF182.0

Notes:

For the deletion of rows, the following delete rules ensure the integrity of referential constraints:

• If the referential constraint is not a self-referencing constraint, a row can be deleted from the dependent table at any time.

If the referential constraint is a self-referencing constraint, the dependent table is the parent table at the same time. Since dependent rows may be parent rows at the same time, the delete rule for the referential constraint may prevent the deletion of dependent rows or cause the deletion of additional rows.

The deletion of rows from the dependent table of a referential constraint can also be impaired by other referential constraints for which the dependent table is the parent table.

• For the deletion of rows from the parent table of a referential constraint, one of the following options can be chosen:

Referential Integrity - Delete Rules

Engine_Number

Engine_Type

Aircraft_Number

PW9880193 PW4062 B474001323

PW9880194 PW4062 B474001323

PW9880195 PW4062

PW9882345 PW4062 B474001323

PW9974034 PW4062 B474001323

R375184566 CF6-80C2 A103003167

R375184567 CF6-80C2

R375184568 CF6-80C2 A103003167ENGINE

AIRCRAFT_MODEL

Type_Code

Model_Number

Length_of_Model

A340 200 59.40

A310 300 46.67

B737 300 33.41

B747 400 70.67

AIRCRAFT

Aircraft_ Number

Date_Manufactured

Type_Code

Model_Number

B474001323 1994-10-12 B747 400

B373004518 1999-02-28 B737 300

B373004519 1999-03-31 B737 300

A103000534 1998-05-12 A310 300

A103003167 1997-08-01 A310 300

A402004217 1999-10-23 A340 200

Aircraft_Number

Seat_Number

B474001323 1A

B474001323 1B

B474001323 1C

. . . . . .B474001323 46J

B171004217 1A

B171004217 1B

. . . . . .B171004217 28GSEAT

NA

SNC



Student Notebook

NO ACTION

For NO ACTION, rows of the parent table can only be deleted if none of the rows of the dependent table becomes an orphan, i.e., does not have a matching parent key value afterwards.

Conceptually, first, all rows requested by the delete request are deleted. After they have been deleted, it is checked if the dependent table contains rows dependent on the deleted rows and now being orphans. If so, the deletions are backed out and the request is rejected. Thus, NO ACTION checks for conflicts after the deletion.

The subsequent RESTRICT option is very similar to NO ACTION, but its effects may be different. NO ACTION is the SQL92 standard whereas RESTRICT is a DB2 implementation.

On the visual, NO ACTION (abbreviated as NA) has been chosen for the illustrated constraint between tables AIRCRAFT_MODEL and AIRCRAFT. This means that an aircraft model can only be deleted if an aircraft is no longer dependent on it.

RESTRICT (DB2 only)

For RESTRICT, rows of the parent table can only be deleted if they are not parent rows, i.e., if none of the rows of the dependent table refer to them.

Conceptually, the checking for dependent rows is done before rows are deleted and not after all rows have been deleted. If a conflict is detected, the request is rejected.

As you can see from this, RESTRICT has a performance advantage over NO ACTION in case of conflicts.

The effects of NO ACTION and RESTRICT may be different for self-referencing constraints because the dependent rows are in the same table as the deleted rows. If considering each deletion of a row individually, as RESTRICT does, there may be conflicts which disappear when considering the deletions collectively as NO ACTION does. NO ACTION is successful if the dependent rows are deleted as well whereas RESTRICT fails if there are any initial dependencies between the rows to be deleted. Thus, DB2 doesn’t allow the usage of RESTRICT for self-referencing constraints.

SET NULL

The foreign key values of rows dependent on deleted parent rows are deleted, i.e., the dependencies are removed.

SET NULL is only an option if the foreign key of the dependent table need not have a value for every row. This raises the question when a composite foreign key is considered to have no value for a row? In general, the foreign key of a row is considered not to have a value if at least one column of the foreign key does not have a value. Thus, SET NULL is only an option if at least one of the foreign key columns has been defined as nullable. SET NULL resets all columns to NULL which have been defined as nullable.




Uempty
On the visual, SET NULL (abbreviated as SN) has been chosen for the illustrated referential constraint between tables AIRCRAFT and ENGINE. This means that, for an engine, the reference to the aircraft is removed if the aircraft is deleted. Practically, this implies that the engine is no longer mounted on an aircraft.
SET NULL can be used because column Aircraft_Number of table ENGINE has been defined as nullable.

CASCADE

Rows dependent on deleted parent rows are deleted as well.

On the visual, CASCADE (abbreviated as C) has been chosen for the illustrated referential constraint between tables AIRCRAFT and SEAT. Thus, if an aircraft is deleted, information about its seats is no longer kept.



Student Notebook

Figure 8-7. Referential Integrity - Update Rules CF182.0

Notes:

The relational data model defines the following update rules for referential constraints:

• The foreign key values of dependent rows can be changed to matching parent key values of the parent table or can be deleted (set to NULL) if permitted. As explained for the delete rules, the deletion of foreign key values generally requires at least one of the foreign key columns being defined as nullable.

• For the updating of parent key values, the relational data model provides the following options:

NO ACTION

For NO ACTION, the parent key values of rows of the parent table can only be changed if none of the rows of the dependent table becomes an orphan, i.e., does not have a matching parent value, afterwards.

Referential Integrity - Update Rules

Foreign key values can be changed to other existing parent key values of the parent table or can be deleted (set to NULL) if permitted

Dependent Table

Most systems only support NO ACTION or RESTRICT

Parent Table

The parent key values of parent rows can only be changed if the dependent table does not have orphans afterwards

NO ACTION

If the parent key value of a parent row is changed, the foreign key values of all dependent rows are set to NULL (if permitted)

SET NULL

If parent key value of a parent row is changed, the foreign key values of all dependent rows are changed accordingly

CASCADE

Can only change parent key values of rows that are not parent rows

RESTRICT




Uempty
As for the delete rules, conceptually, the checking for dependent rows is done after all parent key values have been changed. If orphans are detected, the changes are rolled back and the request is rejected.
RESTRICT (DB2 only)

For RESTRICT, the parent key values of rows of the parent table can only be changed if they are not parent rows, i.e., if none of the rows of the dependent table refer to them.

As for the delete rules, conceptually, the checking for dependent rows is done before the parent key values are changed. If dependent rows are detected, the request is rejected.

SET NULL

The foreign key values of all rows dependent on parent rows whose parent key values are changed are deleted, i.e., the dependencies are removed.

SET NULL is only an option if the foreign key of the dependent table need not have a value for every row. The same foreign key considerations apply as for the delete rules.

CASCADE

The foreign key values of all dependent rows are changed to the new parent key values of their parent rows.

For parent tables, most database management systems (in particular, DB2) only support NO ACTION or RESTRICT. To change the parent key value for a parent row, you can use the following procedure:

1. Add an identical row with the new parent key value to the parent table.

2. Change the foreign key values of the dependent rows to the new parent key value.

3. Delete the former parent row. (Since the foreign key values of the formerly dependent rows have been changed, the former parent row is no longer a parent row and, therefore, can be deleted.)

Alternatively, you can temporarily removed the referential constraint and reestablish it after the parent key values have been changed. However, this requires that the integrity of the referential constraint is checked by means of utilities before the dependent table can be processed again.



Student Notebook

Figure 8-8. Delete Rules and ER Model (1 of 8) CF182.0

Notes:

As mentioned before, we will always use the primary key of the parent table as parent key. Therefore, we will no longer use the term parent key and only talk about primary key/foreign key relationships in conjunction with referential constraints in the remainder of the unit.

The existence of a referential constraint means that rows of the dependent table refer to rows of the parent table. This implies an interrelationship between the parent table and the dependent table. Since the tables are derived from tuple types, which are derived from entity types and relationship types, referential constraints are the consequence of relationship types of the entity-relationship model. In many cases, the entity-relationship model also helps you determine the proper delete rules for the referential constraints as illustrated by the next series of visuals.

The above visual discusses the resulting delete rules if one of the tables is for (a tuple type belonging to) a dependent entity type. The key of the dependent entity type contains, as a part, the key of the parent entity type or relationship type. For each dependent instance, the appropriate parent instance must exist at all times.

. .m

Target

Source

D

. .1

Target

Source

D

Delete Rules and ER Model (1 of 8)

D . .m

Target

Source

C

D . .1

Target

Source

C

TARGET

SOURCE

C

TARGET

SOURCE

C

TARGET

SOURCE

TARGET

SOURCE

NA

NA




Uempty
The interrelationships between source and target instances are expressed by the key of the dependent entity type. As we know, a tuple type, and thus a table, is not established for the owning relationship type. For the established tables, the interrelationships between the instances (rows) are expressed by the primary key of the dependent table. The primary key of the parent table is part of the primary key of the dependent table and constitutes a foreign key for the dependent table.
The left two examples illustrate the delete rule to be chosen if the controlling property has not been chosen for the dependent entity type. In these cases, the instances of the dependent entity type, and thus the rows of the dependent table, are dependent on the existence of the appropriate parent instances or rows. Since controlling has not been specified for the dependent entity type, a parent instance (row) cannot be deleted as long as an instance (row) is dependent on it. Consequently, the proper delete option for the referential constraint is NO ACTION (or RESTRICT) independently of the cardinality for the dependent entity type.

If the controlling property has been specified for the dependent entity type, dependent instances, and thus rows, are to be deleted if the associated parent instances (rows) are deleted. Thus, in this case, the proper delete option for the referential constraint is CASCADE as illustrated for the right two examples on the visual.



Student Notebook


Notes:

The cases on this visual consider m:m relationship types. For them, tuple types and tables are established for source, target, and relationship type. Let us call them SOURCE, TARGET, and R, respectively. Since the relationship key consists of the keys of source and target, the primary key of R consists of the primary keys of SOURCE and TARGET. None of the columns of the primary key may be nullable.

For a relationship instance, the associated source and target instances must exist at all times. Therefore, for each row of R, rows with corresponding primary key values must exist in SOURCE and TARGET. Consequently, as part of the primary key of R, the primary keys of SOURCE and TARGET constitute foreign keys for R.

For relationship instances, the rule applies that they are deleted if their source or target instances are deleted and the relationship instance can be deleted. Whether or not and when a relationship instance can be deleted is controlled by the minimum cardinalities for the relationship type. The controlling property may also have a certain effect as will be illustrated on the next visual. If a relationship instance cannot be deleted, its source or target instances cannot be deleted.

m

m

Source

r

Target

m

1. .m

Source

r

Target

m

1. .m

Source

r

Target

1. .m

1. .m

Source

r

Target


CC CNA

C NA

TARGETSOURCE

R

TARGETSOURCE

R

TARGETSOURCE

R

TARGETSOURCE

R

NANA




Uempty
The cases on the visual assume that the controlling property has not been specified for either end of the relationship type. For the top left case, the minimum cardinalities for both ends of the relationship type are 0. Thus, a relationship instance need not exist for a source or target instance and nothing prevents the deletion of relationship instances. As a consequence, if a parent row of one of the two parent tables is deleted, the corresponding dependent rows must be deleted as well. Thus, CASCADE is the proper delete rule for both referential constraints.
For the top right case, the minimum cardinality for the source is 1 and the minimum cardinality for the target is 0. Consequently, for each target instance, at least one relationship instance must exist. Since the controlling property has not been specified for the target, the deletion of a relationship instance should not cause the automatic deletion of its target instance.

Both points together disallow the deletion of a source instance if it resulted in a target instance without relationship instance. They require a delete rule of NO ACTION (or RESTRICT) for the referential constraint between the tables for the source and the relationship type: You do not allow the deletion of a row of the table for the source as long as a row is dependent on it; otherwise, you could delete the row for the last relationship instance for the target.

Note that this does not completely match the meaning of minimum cardinality 1 for the source of the relationship type, but this is as close as you can get.

Since the minimum target cardinality is 0, a source instance need not have a relationship instance. Thus, a relationship instance can be and must be deleted if its target instance is deleted. For the referential constraint between the tables for the target and the relationship type, this translates into a delete rule of CASCADE.

For the bottom right case, the roles of source and target have been reversed. Thus, the delete rules are: CASCADE for the referential constraint between the tables for the source and the relationship type; NO ACTION (or RESTRICT) for the referential constraint between the tables for the target and the relationship type.

For the bottom left case, both minimum cardinalities are 1. This means that a relationship instance cannot be deleted if it is the last relationship instance for the source instance or the target instance. This translates into delete rules of NO ACTION (or RESTRICT) for both referential constraints with the caveat mentioned above.

Whenever you have a delete rule of NO ACTION or RESTRICT, you must get rid of dependent rows (relationship instances) first. Only thereafter, you can delete the parent rows.



Student Notebook


Notes:

For the previous visual, we assumed that the controlling property had not been specified for either end of the relationship type. As a consequence, the deletion of a relationship instance did not affect any source or target instances. Likewise, the deletion of a row of the table for the relationship type did not have an effect on rows of the tables for source and target.

If the controlling property has been specified for an end of the relationship type, the deletion of the row for a relationship instance affects rows in other tables: If the controlling property has been specified for the source, the row for the source instance must be deleted. If the controlling property has been specified for the target, the row for the target instance must be deleted. As on the visual, let us assume that the controlling property has been specified for the target of the relationship type. Then, the deletion of the row for a relationship instance should automatically trigger the deletion of the row for the target instance (from the target table).

Referential integrity does not provide for the automatic deletion of rows in tables other than the dependent table. For the example on the visual, you can still use NO ACTION or

m

1. .m

Source

r

Target

C

TARGETSOURCE

R

CC

m

1. .m

Source

r

Target

C


TARGETSOURCE

R

CNA OR

CREATE TRIGGER . . .

REFERENCING OLD AS OFOR EACH ROW MODE DB2SQL

AFTER DELETE ON R

BEGIN ATOMICDELETE FROM TARGET

O.foreign-key;WHERE primary-key =

END




Uempty
RESTRICT for the referential constraint between the tables for the source and the relationship type. This would prevent rows for target instances without corresponding rows for relationship instances, but would ignore the controlling property.
If you want to implement the controlling property correctly and your database management system supports triggers, you can do the following:

• Use a delete rule of CASCADE for the referential constraint between the table for the end opposing the controlling property and the table for the relationship type.

In the example, this is for the referential constraint between the tables for the source and the relationship type. (The other delete rule is already CASCADE because of minimum cardinality 0 for the target.)

• To achieve the automatic deletion of the source or target instance for the relationship instance, you need a trigger for the table for the relationship type. This trigger must be an after trigger activated on each deletion of a row from the table for the relationship type. It must delete the row of the controlled end of the relationship type associated with the row being deleted. The controlled end is the end for which the controlling property has been specified.

In the example on the visual, the controlling property has been specified for the target. Therefore, the appropriate row in the table for the target must be deleted. The trigger shown on the visual will achieve this. The WHERE clause of the DELETE statement has been simplified. It assumes that the primary key and the foreign key are not composite keys. If they were composite keys, multiple predicates, combined by logical ANDs, would be needed in the WHERE clause.

Note that the correlation name of the REFERENCING clause is needed to refer to the foreign key value of the row for the relationship instance being deleted.



Student Notebook


Notes:

The cases on this visual consider 1:m relationship types. For simplicity, let us assume that the maximum cardinality for the source is 1. (If the maximum cardinality for the target is 1 instead, just reverse the directions of the relationship type.) In this case, the tuple type for the relationship type can be integrated or imbedded into the tuple type for the target.

Tables are created for the source tuple type and the extended target tuple type. As foreign key, the table for the extended target tuple type contains the primary key of the table for the source.

If the minimum source cardinality is 0, a relationship instance need not exist for a target instance. Thus, if a source instance is deleted, the target does not prevent the deletion of relationship instances for the source.

If the controlling property has not been specified for the target (top left case), this translates into a delete rule of SET NULL for the referential constraint. CASCADE would be wrong since it would delete the row for the target instance as well. NO ACTION or RESTRICT would be too restrictive since it would prevent the deletion of the relationship instance.

. .m

Source

Target

1

TARGET

SOURCE

. .m

Source

Target

1. .1

TARGET

SOURCE

Source

Target

. .m

1

C

TARGET

SOURCE

Source

Target

. .m

1. . 1

C

TARGET

SOURCE


SN

NA

C

C




Uempty
If the controlling property has been specified (top right case), the corresponding target instance should be deleted as well if a relationship instance is deleted. Thus, CASCADE is the proper delete rule.
The bottom two cases deal with a minimum cardinality of 1 for the source of the relationship type. Consequently, for each target instance, at least one relationship instance must exist.

For the bottom right case, the controlling property has been specified for the target meaning that the target instance should be deleted as well if a relationship instance is deleted. Thus, the minimum source cardinality of 1 does not block the deletion of the relationship instance and CASCADE is the proper delete rule for the referential constraint.

For the bottom left case, the controlling property has not been specified for the target. Accordingly, the deletion of a relationship instance should not cause the automatic deletion of its target instance. Minimum cardinality 1 together with the absence of the controlling property disallow the deletion of a source instance if it resulted in a target instance without relationship instance. They require a delete rule of NO ACTION (or RESTRICT) for the referential constraint: You do not allow the deletion of a row of the table for the source as long as a row is dependent on it; otherwise, you could delete the row for the last relationship instance for the target.

Note that this does not completely match the meaning of minimum cardinality 1 for the source of the relationship type, but this is as close as you can get.

For 1:1 relationship types, the delete rules are determined in the same manner. The only consideration that is different is that you have a choice for the relationship key. It can either be the key of the source or the key of the target. Depending on the choice, the tuple type for the relationship type can be integrated into the tuple type for the source or for the target. The table for the tuple type into which the tuple type for the relationship type is integrated contains the foreign key.



Student Notebook


Notes:

In Unit 6 - Tuple Types, we saw that tuple types must not be provided for m:m relationship types being the source (target) of another relationship type with a minimum target (source) cardinality of 1. We will study the delete rules for the appropriate cases.

On the current visual and the next four visuals, the m:m relationship type and the other relationship type are called r1 and r2, respectively. The source and target of r1 are called A and B. Without loss of generality, we assume that r1 is the source of r2. The target for r2 is called C.

Furthermore, we assume for this visual and the next that the maximum source cardinality of r2 is m; otherwise, the tuple type for r2 could be combined with the tuple type for its target.

As we saw in Unit 6 - Tuple Types, a tuple type for r1 must not be provided since the tuple type for r2 accurately describes the relationship instances for r1. Therefore, tables must only be created for A, B, C, and r2. Let us name them TA, TB, TC, and TR2, respectively. To describe the relationship instances for r1, TR2 contains columns for the primary keys of TA and TB. In addition, it contains columns for the primary key TC.


mm r1

r2

1. .m

mA

C

B

mm r1

r2

1. .m

1. .mA

C

B

NANA NA

TR2

TBTCTA

NA CC

TR2

TBTCTA




Uempty
Since the relationship key for r1 and the key of C are defining attributes for r2, the appropriate columns of TR2 must not be nullable. They are even foreign keys of TR2 because their values must exist as primary key values in the respective parent tables.
For the cases on this visual, the minimum cardinality is 0 for both the source and the target of r1. This means that the affiliated relationship instances can be deleted if a source or target instance is deleted.

For the upper case on the visual, the minimum source cardinality for r2 is 0 implying that a relationship instance need not exist for an instance of C. Consequently, the deletion of the source or target instance for a relationship instance of r1 must remove the relationship instance and any relationship instances of r2 for which it is the source. This translates into delete rules of CASCADE for the referential constraints between TA and TR2 and TB and TR2.

For the referential constraint between the TC and TR2, the following considerations apply: The minimum target cardinality of 1 for r2 requires an instance of r2 for each instance of r1. Since the controlling property has not been specified for the source of r2, the delete rule must be NO ACTION or RESTRICT; otherwise, the last row of TR2 for an instance of r1 could be deleted when a row of TC is deleted.

Again, note that delete rule NO ACTION (or RESTRICT) does not completely match the meaning of the minimum target cardinality.

For the second case on the visual, the minimum source cardinality of r2 is 1. Therefore, each instance of C requires an instance of r2. Since the controlling property has not been specified for C, the delete rules for the referential constraints between TA and TR2 and TB and TR2 must be NO ACTION or RESTRICT; otherwise, the last row of TR2 for an instance of C could be deleted when rows of TA or TB are deleted.

Again, note that delete rule NO ACTION (or RESTRICT) does not completely match the meaning of the minimum source cardinality.

If the controlling property for C were specified, delete rules of CASCADE could be used for the referential constraints in conjunction with a trigger as outlined on page 8-22.

The trigger would need to delete the appropriate row in TC if a row of TR2 were deleted.



Student Notebook


Notes:

The cases on this visual illustrate the delete rules if the minimum cardinality of source or target of r1 is 1.

If, for example, the minimum cardinality of the source of r1 is 1, a relationship instance must exist for each target instance of r1. The deletion of a source instance must not cause the deletion of the last relationship instance of r1 for a target instance. Consequently, the delete rule for the referential constraint between TA and TR2 must be NO ACTION or RESTRICT. (Again, with the caveat that this does not match completely the meaning of minimum cardinality 1.)

For the second case, both minimum cardinalities of r1 are 1. As a consequence, the delete rules between TA and TR2 and TB and TR2 must both be NO ACTION or RESTRICT.


r2

1. .m

m

mr11. .mA

C

B

NA CNA

TR2

TBTCTA

r2

1. .m

m

1. .mr11. .mA

C

B

NANA NA

TR2

TBTCTA




Uempty


Notes:

As we saw in Unit 6 - Tuple Types, tuple types must not be provided for r1 and r2, if:

• r1 is an m:m relationship type, • r2 is an owning relationship type whose source or target is r1, and • the minimum cardinality for the dependent entity type is 1.

This is because the key of r1 is part of the key of the dependent entity type and a dependent entity instance must exist for each instance of r1. Accordingly, tables are only established for A, B, and C. They are called TA, TB, and TC, respectively. The primary key of TC comprises foreign keys referring to TA and TB.

For the cases on the visual, the minimum cardinalities are 0 for the source and the target of r1.

Similar conclusions as for the previous visuals lead to delete rules of NO ACTION or RESTRICT if the controlling property has not been specified for the dependent entity type. The delete rules cannot be CASCADE since the deletion of rows of TA or TB would cause the deletion of rows of TC. The delete rules cannot be SET NULL either. SET NULL could


m mr1

r2

1. .m D

A

C

B

NA NA

TC

TBTA

A

C

Bm mr1

r2

1. .m D C C C

TC

TBTA



Student Notebook

cause instances for C without instances for r1 which is not allowed for dependent entity types.

If the controlling property has been specified for the dependent entity type, both delete rules must be CASCADE. If instances of A or B are deleted, affiliated relationship instances of r1 should be deleted. Because of the controlling property for C, the associated dependent entity instances are to be deleted as well.




Uempty


Notes:

This visual illustrates two cases for which the minimum cardinalities for source or target of relationship type r1 are 1.

Let us discuss the first case for which the minimum cardinality of the source is 1. The minimum cardinality of 1 for A blocks the deletion of the last relationship instance for an instance of B if an instance of A is to be deleted. Thus, a delete rule of NO ACTION or RESTRICT is appropriate for the referential constraint between TA and TC. (Again, the caveat for the minimum cardinality of 1 applies.) Note that the delete rules are independent of whether or not the controlling property has been specified for the dependent entity type: In any case, you must block the deletion of the relationship instance for r1.

As a consequence of the controlling property for the dependent entity type, the delete rule for the referential constraint between TB and TC must be CASCADE. If the controlling property were not specified, the delete rule would be NO ACTION or RESTRICT.

For the second case on the visual, the minimum cardinalities are 1 for both ends of relationship type r1. This leads to delete rules of NO ACTION or RESTRICT for both referential constraints.


mr11. .m

r2

1. .m D C

A

C

B

NA C

TC

TBTA

r2

1. .m D C

r11. .m 1. .mA

C

B

NA NA

TC

TBTA



Student Notebook

Figure 8-16. Delete Rules for an Imbed Case CF182.0

Notes:

From the preceding discussions, we know that a tuple type is not required for relationship type r1. Its relationship instances are completely described by the tuples for r2. However, the source cardinality of 1 for r2 allows us to imbed the tuple type for r2 into the tuple type for its target C.

Accordingly, tables are only established for A, B, and extended tuple type C. Let us call them TA, TB, and TC, respectively. As the consequence of the imbedding, TC contains the primary keys of TA and TB as foreign keys and the appropriate columns must be defined as nullable.

If an instance of A or B is deleted, any relationship instances of r1 for it must be deleted as well. In turn, the relationship instances of r2 being dependent on the deleted instances of r1 must be deleted. The cardinalities for r2 do not prevent the deletion of the instances of r2. However, the target instances for the relationship instances (these are instances of C) must not be deleted.

For the two referential constraints, this seems to translate into delete rules of SET NULL. However, not quite so. If a row of TA is deleted, only the references to it in TC are deleted.

Delete Rules for an Imbed Case

1

r2

1. .m

mr1mA

C

B

SNSN

TC

TBTA


REFERENCING NEW AS NFOR EACH ROW MODE DB2SQL

AFTER UPDATE ON TC

BEGIN ATOMICUPDATE TC

SET foreign-key-TA = NULL, foreign-key-TB = NULL

END

WHEN( (N.foreign-key-TA IS NULL AND N.foreign-key-TB IS NOT NULL)OR (N.foreign-key-TB IS NULL AND N.foreign-key-TA IS NOT NULL) )

WHERE primary-key = N.primary-key;




Uempty
Likewise, if a row of TB is deleted, only the references to it in TC are deleted. However, the rows of TC describe relationship instances for r1 and not unrelated references to TA and TB. A relationship instance consists of a pair of references and not a single reference. Therefore, the references to TA and TB in a row of TC should be deleted at the same time.
You may decide not to care about a reference to TA or TB in TC if the other reference is NULL. However, if you want to correctly implement relationship type r1, you need a trigger synchronizing the foreign key columns when one of them is set to NULL. The trigger on the visual achieves this.

The trigger is activated after a row of TC has be updated. The changing of foreign key columns by the referential integrity support is considered as an update of the row . Some systems (e.g., DB2) do not consider it as an update of the columns. Therefore, you should not specify individual columns (UPDATE OF ...).

The WHEN clause ensures that the triggered action is executed only if one of the new values for the foreign keys is NULL and the other is not. The triggered action, i.e., the UPDATE statement, sets both foreign key values for a row to NULL.

In the trigger, synonyms foreign-key-TA and foreign-key-TB are used to denote the foreign key columns in TC referring to TA and TB, respectively. The REFERENCING clause enables us to refer to the new values of the updated rows of TC.



Student Notebook

Figure 8-17. Delete Connection CF182.0

Notes:

A table T is delete-connected to a table T1 if the deletion of a row of T1 may require (immediate or indirect) accesses to T. For example, for the deletion of a row of T1, it may be necessary to determine if T contains rows with a foreign key value equal to the primary key value of the deleted row.

In the example on the visual, T is delete-connected to T1 over two paths of referential constraints:

• Let us first consider the left path of referential constraints. Delete rule CASCADE between tables T1 and T2 may cause the deletion of rows of T2 if a row is deleted from T1. Because of delete rule CASCADE between T2 and T3, this may, in turn, cause the deletion of rows from T3. Delete rule NO ACTION between T3 and T requires that table T is checked for matching foreign key values to determine if the rows of T3 can be deleted.

Thus, T is delete-connected to T1 via the left path. Of course, T2 and T3 are also delete-connected to T1.

Delete Connection

Delete Connected

Must be the same and

not SET NULL

CASCADE

CASCADE

NO ACTION NO ACTION

CASCADE

T1

T4

T3

T2

T




Uempty
• T is also delete-connected to T1 via the right path of referential constraints: Delete rule CASCADE between tables T1 and T4 may cause the deletion of rows of T4 if a row is deleted from T1. Delete rule NO ACTION between T4 and T requires that table T is checked for matching foreign key values to determine if the rows of T4 can be deleted.
Of course, T4 is also delete-connected to T1.

For delete-connected tables, the following restriction applies:

If T is delete-connected to T1 via multiple paths with different referential constraints for T, then the delete rules for the referential constraints involving T must be the same and must not be SET NULL.

Otherwise, the result of the deletion of a row from T1 would depend on the sequence the various paths are processed in by the database management system. The relational data model requires, however, that the result be independent of the sequence chosen. Also, the number of variations could be so large that checking if the result were the same for all processing sequences of the paths must be ruled out.



Student Notebook

Figure 8-18. Referential Cycles CF182.0

Notes:

A referential cycle is a sequence of referential constraints leading back to the same table.

The visual shows two cycles: First, it illustrates a cycle consisting of multiple referential constraints involving tables T1, T2, and T3. Then, it illustrates a cycle just involving a single table, i.e., a self-referencing constraint.

For referential cycles, the following restrictions apply:

1. In a referential cycle of two or more tables, the tables must not be delete-connected to themselves.

This implies that at least two of the delete rules must not be CASCADE.

2. The delete rule for a self-referencing constraint must be CASCADE or NO ACTION. It cannot be RESTRICT or SET NULL.

In both cases, the result of operations deleting multiple rows from a table would dependent on the sequence in which the rows are deleted. The relational data model requires the results to be independent of the processing sequences of the rows.

Referential Cycles

Must beNO ACTION

or CASCADE

At least twomust not be

CASCADESET NULL

CASCADE

NO ACTION

CASCADE

T3

T2

T1




Uempty
Note that the table of a self-referencing constraint is always delete-connected to itself.


Student Notebook

Figure 8-19. Definition of Referential Constraints CF182.0

Notes:

If you want to use your database management system for enforcing a referential constraint between two tables, you must define the referential constraints to the database management system. A referential constraint concerns two tables: the parent table and the dependent table.

Assuming that the parent key of the parent table is the primary key (as we do), you must define which columns form the primary key for the table. If the primary key is a composite key, you must define the sequence of the columns for the primary key. This sequence is relevant for the foreign key of the dependent table. The referential constraint itself must be defined for the dependent table. First, you must specify the columns of the foreign key. They must be specified in the same sequence as the corresponding primary key columns. Next, you must specify the parent table. In addition, you must specify the delete rule and, if your database management system allows it and gives you a choice, the update rule for the referential constraint.

You may also give the referential constraint a name. The name can be used to delete the constraint again if it is no longer needed.

Definition of Referential Constraints

Dependent Table

FOREIGN KEY( fk-column-1, fk-column-2, . . . )

ON DELETE delete-rule

ON UPDATE update-rule

CONSTRAINT constraint-name

Parent Table

PRIMARY KEY( pk-column-1, pk-column-2, . . . )

REFERENCES parent-table




Uempty

Figure 8-20. Referential Integrity - Documentation CF182.0

Notes:

The documentation for a referential constraint should be added to the documentation for the dependent table. For each referential constraint, provide the following information:

• A name for the referential constraint. You should name each referential constraint. The name must be unique per dependent table, but we suggest to make it unique for the application domain.

• An ordered list of the columns making up the foreign key. The order of the columns must match the order of the corresponding primary key columns. The names can be different.

• The name of the parent table, i.e., the table containing the corresponding primary key.

• The delete rule for the referential constraint, i.e., NO ACTION, RESTRICT, SET NULL, or CASCADE.

• If your database management system gives you a choice, the update rule for the referential constraint. In most cases, this will be NO ACTION or RESTRICT.

Referential Integrity - Documentation

For each referential constraint, add to dependent table:

Name for referential constraint. Must be unique for table. Should be unique for application domain

Constraint Name:

Foreign Key Columns:

Ordered list of columns for foreign key of referential constraint

NO ACTION or RESTRICTUpdate Rule:

Delete Rule: NO ACTION, RESTRICT, SET NULL, or CASCADE

Parent Table:

Name of parent table

Constraint Number:

A unique number for the referential constraint. Used to identify the constraint in referential structures



Student Notebook

• A unique constraint number. You should give each referential constraint of the application domain a unique number. This number is not needed for defining the referential constraint to the target database management system. It is used to identify the constraint in referential structures (described later in this topic).

A referential structure provides an overview of the referential constraints for the application domain or for a subset thereof. It shows how the tables for the application domain are interconnected by referential constraints. The constraint number in the referential structure serves as reference to the documentation for the referential constraint. It can be used to find details about the referential constraint.




Uempty

Figure 8-21. Maintenance View - Updated ER Model CF182.0

Notes:

In Unit 4 - Entity-Relationship Model, we established the initial Maintenance View for our sample airline company called Come Aboard. This visual shows an update of the Maintenance View. It includes the changes caused by normalization.

The major changes are:

• Dependent entity type SEAT has been added as a result of normalization (First Normal Form).

• Entity types ENGINE and ENGINE POSITION and relationship types ENGINE_on_AIRCRAFT and ENGINE_on_AIRCRAFT_in_ENGINE LOCATION have been added due to normalization (First Normal Form).

• Entity type MANUFACTURER and relationship types AIRCRAFT TYPE_from_MANUFACTURER and ENGINE_from_MANUFACTURER have been added due to normalization (Third Normal Form).

On the visual, the relationship types for which tuple types and, thus, tables must not be established have been grayed out.

Maintenance View - Updated ER Model

m

_has_

DC 1. .1

_in_

D C

m

1. .1

_for_

D1. .m

_for_

1. .1

m

_from_

_from_

m 1. .1

_on_

m1

mm

_trained_for_

m

m

_scheduled_for_

1. .1

m

_from_

C Owner

m 1

_belongs_to_

AIRCRAFT

AIRCRAFT TYPE

AIRCRAFT MODELMECHANIC

SEAT

MANU-FACTURER

ENGINE

ENGINE LOCATION

MAINTENANCE RECORD



Student Notebook

Figure 8-22. Referential Structure CF182.0

Notes:

A referential structure is a graphical representation of the referential constraints for an application domain. It gives an overview of the referential constraints, not a detailed description.

The referential structure for the entire application domain may not fit onto a single page with the consequence that it must be split into subsets fitting onto a page. On the visual, we have concentrated on the subset corresponding to the (updated) Maintenance View for our sample airline company called Come Aboard.

The referential structure contains rectangles for all tables of the considered subset. The rectangles contain the names of the tables. A referential constraint between two tables is represented by a single-headed or double-headed arrow leading from the parent table to the dependent table. A single-headed arrow is used if a primary key value can occur at most once in the dependent table. A double-headed arrow is used if a primary key value can occur more than once in the dependent table. (Note that a foreign key value can occur only once in the parent table.)

Referential Structure

Referential Structure for Maintenance View

C9

11

NA

NA1

8C

C

12

NA

3

NA

2

C

4

NA

5

SN6

C7

C10

AIRCRAFT_MODELMECHANIC

SEAT

AIRCRAFT

AIRCRAFT_TYPE

MECHANIC_FOR_AM

MECHANIC_FOR_AC ENGINE

MANU-FACTURER

MAINTENANCE_RECORD




Uempty
Next to the arrowhead and next to the dependent table, the delete rule for the referential constraint is specified. The abbreviations NA, R, SN, and C are used for NO ACTION, RESTRICT, SET NULL, and CASCADE, respectively.
A little square with a number is placed on the arrow to identify the referential constraint. We have already talked about that number, referred to as constraint number, in conjunction with the documentation of referential constraints. The constraint number identifies the documentation for the referential constraint being part of the documentation for the dependent table. You could think of using the constraint name instead, but the constraint name is generally too long and clumsy for the use in diagrams.

You can also add a constraint summary, in form of a listing or table (see page 8-46), providing the names of the tables involved and the foreign key columns.

As mentioned before, the referential structure for the entire application domain may not fit onto a single page with the consequence that it must be split into subsets fitting onto a page. If possible, the subsets should correspond to the submodels you established for the entity-relationship model of the application domain. Otherwise, proceed in the same manner as for the entity-relationship model and establish referential (sub)structures for autonomous subareas or different views of the application domain. Only if they will not fit onto a single page, establish referential (sub)structures for sets of tables logically belonging together and fitting onto a page.

Generally, the referential (sub)structures of the various pages will overlap. Some tables and referential constraints will occur on multiple pages. The (sub)structures must not conflict with each other. Together, they must cover all referential constraints for the application domain.

Now, let us discuss the referential structure for the Maintenance View of Come Aboard in more detail:

• Tables must be established for all entity types of the Maintenance View with the exception of ENGINE LOCATION. Its tuple type together with the tuple type for relationship type AIRCRAFT_on_ENGINE could be imbedded in the tuple type for ENGINE.

Furthermore, tables must be established for all m:m relationship types, i.e., for MECHANIC_trained_for_AIRCRAFT MODEL and MECHANIC_scheduled_for_AIRCRAFT. The appropriate tables have been called MECHANIC_for_AM and MECHANIC_for_AC, respectively.

Tables are not needed for any 1:m relationship types. Their tuple types can be combined with tuple types for their source or target.

• Since the tuple type for relationship type AIRCRAFT TYPE_from_MANUFACTURER has been imbedded into the tuple type for AIRCRAFT TYPE, table AIRCRAFT_TYPE has a foreign key referring to table MANUFACTURER.



Student Notebook

Because of minimum cardinality 1 for entity type MANUFACTURER and the absence of the controlling property for AIRCRAFT TYPE, the delete rule for the referential constraint must be NO ACTION. (An aircraft type must always have a manufacturer.)

• Because AIRCRAFT MODEL is a dependent entity type, table AIRCRAFT_MODEL contains, as foreign key, the primary key of table AIRCRAFT_TYPE. Since the controlling property has not been specified for the dependent entity type, the delete rule must be NO ACTION. (An aircraft model must always have an aircraft type.)

• Since the tuple type for relationship type AIRCRAFT MODEL_for_AIRCRAFT has been imbedded into the tuple type for AIRCRAFT, table AIRCRAFT has a foreign key referring to table AIRCRAFT_MODEL.

Because of minimum cardinality 1 for entity type AIRCRAFT MODEL and the absence of the controlling property for AIRCRAFT, the delete rule for the referential constraint must be NO ACTION. (An aircraft must always have an aircraft model.)

• Because SEAT is a dependent entity type, table SEAT contains, as foreign key, the primary key of table AIRCRAFT.

Since the controlling property has been specified for the dependent entity type, the delete rule must be CASCADE. (If the aircraft is removed, information about the seats on the aircraft need no longer be kept.)

• Since the tuple type for relationship type ENGINE_from_MANUFACTURER has been imbedded into the tuple type for ENGINE, table ENGINE has a foreign key referring to table MANUFACTURER.

Because of minimum cardinality 1 for entity type MANUFACTURER and the absence of the controlling property for ENGINE, the delete rule for the referential constraint must be NO ACTION. (An engine must always have a manufacturer.)

• As we mentioned before, the tuple types for entity type ENGINE LOCATION and relationship type ENGINE_on_AIRCRAFT have been imbedded into the tuple type for ENGINE. Therefore, table ENGINE has a foreign key referring to table AIRCRAFT.

Because the minimum cardinality of AIRCRAFT is 0 for relationship type ENGINE_on_AIRCRAFT, the delete rule must be SET NULL. (An engine need not be mounted on an aircraft.)

• Relationship type MECHANIC_trained_for_AIRCRAFT MODEL is an m:m relationship type. Therefore, table MECHANIC_FOR_AM has foreign keys referring to tables AIRCRAFT_MODEL and MECHANIC, respectively.

Since the minimum cardinalities for both ends of the relationship type are 0, both delete rules must be CASCADE. (The relationship between an aircraft model and a mechanic can be deleted if either one is "deleted".)

• Relationship type MECHANIC_scheduled_for_AIRCRAFT is an m:m relationship type. Therefore, table MECHANIC_FOR_AC has foreign keys referring to tables AIRCRAFT and MECHANIC, respectively.




Uempty
Since the minimum cardinalities for both ends of the relationship type are 0, both delete rules must be CASCADE. (The relationship between an aircraft and a mechanic can be deleted if either one is "deleted".)
• Since the tuple type for relationship type MAINTENANCE RECORD_from_MECHANIC has been imbedded into the tuple type for MAINTENANCE RECORD, table MAINTENANCE_RECORD has a foreign key referring to table MECHANIC.

Because of minimum cardinality 1 for entity type MECHANIC and the absence of the controlling property for MAINTENANCE RECORD, the delete rule for the referential constraint must be NO ACTION. (A maintenance record must always have a mechanic.)

• Since the tuple type for relationship type MAINTENANCE RECORD_belongs_to_MAINTENANCE RECORD has been imbedded into the tuple type for MAINTENANCE RECORD, table MAINTENANCE_RECORD has a self-referencing constraint.

Because of the controlling property for the target end of the relationship type, the delete rule for the referential constraint must be CASCADE. (A maintenance record should be thrown away if its owning maintenance record is deleted.)

If you assumed that the controlling property had not been specified, the delete rule should be SET NULL. (If the owning maintenance record is deleted, the dependent records are kept, but their references to the owning record are reset.) However, the restrictions for self-referencing constraints would not allow us to choose SET NULL as delete rule. We would have to choose NO ACTION because CASCADE, the other alternative, would delete the dependent records.



Student Notebook

Figure 8-23. Referential Structure - Constraint Summary CF182.0

Notes:

This visual shows the constraint summary for the Maintenance View. The line numbers match the numbers for the constraints. For each constraint, the dependent table, the parent table, and the foreign key columns are listed. The numbers in front of foreign key columns specify their sequence in the foreign key.

Referential Structure - Constraint Summary

Dependent Table Parent Table Foreign Key

1 AIRCRAFT_TYPE MANUFACTURER Manfacturer_Code

2 AIRCRAFT_MODEL AIRCRAFT_TYPE Type_Code

3 AIRCRAFT AIRCRAFT_MODEL 1: Type:Code, 2: Model_Number

4 SEAT AIRCRAFT Aircraft_Number

5 ENGINE MANUFACTURER Manufacturer_Code

6 ENGINE AIRCRAFT Aircraft_Number

7 MECHANIC_FOR_AM AIRCRAFT_MODEL 1: Type_Code, 2: Model_Number

8 MECHANIC_FOR_AM MECHANIC Employee_Number

9 MECHANIC_FOR_AC AIRCRAFT Aircraft_Number

10 MECHANIC_FOR_AC MECHANIC Employee_Number

11 MAINTENANCE_RECORD MECHANIC Employee_Number

12 MAINTENANCE_RECORD MAINTENANCE_RECORD Owning_Record




Uempty
8.2 Other Types of Integrity


Student Notebook

Figure 8-24. Domain Integrity CF182.0

Notes:

Domain integrity, also referred to as value integrity, deals with the correctness of values in columns of tables.

A column must only assume values allowed by the abstract data type for the data element associated with the column. Furthermore, the values must adhere to the domain specifications (restrictions) for the column's data element. They must also observe length requirements or restrictions for the data element such as the minimum length, the maximum length, the number of digits, or the number of decimal places. The length requirements may have been expressed by parameters for the abstract data type for the data element.

For example, column Type_Code for table AIRCRAFT must only assume 3-letter codes for valid airports. Column Number_of_Engines for table AIRCRAFT_TYPE must only assume integer values between 0 and 4. Column Last_Name of table EMPLOYEE may only assume values of abstract data type NAMEDATA and must not be longer than 50 characters.

Domain Integrity

Handled by abstract data types

Including values for abstract data type

Including domains for data elements

Including restrictions for lengths

Discussed during previous unit

Ingredients required

User defined distinct typesUser defined functionsTriggersCheck Constraints

Domain integrity = Correctness of

values and domains for columns




Uempty
The ingredients required to ensure domain integrity are user defined distinct types, user defined functions, triggers, and check constraints.
Since the implementation of domain integrity is closely related to the implementation of abstract data types described in Unit 7 - From Tuple Types to Tables, we need not discuss it further.



Student Notebook

Figure 8-25. Redundancy Integrity CF182.0

Notes:

Redundancy integrity deals with the redundant storage of information in tables.

There are three major causes for the redundant storage of data:

• Violations of the Second Normal Form or Third Normal Form lead to the redundant storage of data in the same table. Nonkey columns are solely dependent on columns that are not primary key columns (Third Normal Form) or only on some of the primary key columns (Second Normal Form). To ensure consistency, the dependent columns must have the same values for all rows having the same values for the columns on which the dependency exists.

For this type of redundancy, you can maintain integrity as follows:

- Do not allow end users to maintain the tables concerned directly through SQL Data Manipulation Language (DML) statements (INSERT, UPDATE, or DELETE) via dynamic SQL.

Redundancy Integrity

Use triggers to ensure consistency of copies

If copies must be consistent at all times

If copies need not be consistent at all times

Provide new versions periodically

Disallow inserts, updates, or deletes for copies

Alternative is not to store derivable data and to derive them on retrieval

Use triggers and user defined functions to derive data on updates, inserts, and deletes

Redundancy Integrity Multiple

Copies of Data

Derivable Data

Violations of 2nd and 3rd Normal

Forms

Do not allow end users to maintain tables directly through SQL DML statements

Provide front-ends with proper SQL DML statements ensuring integrity

Update all rows concerned at the same time

On insert, copy existing redundant information

Can use triggers to maintain integrity for update operations




Uempty
- Instead, provide front-ends with attractive user interfaces for the business processes concerned. In the front-ends, use the proper program logic and SQL statements to ensure the consistency of the redundant data.
For update operations, if redundant information is changed, all rows having the same values for the columns on which the functional dependency exists must be changed at the same time.

If new rows are inserted, copy the redundant information from existing rows already containing the information rather than having the end user enter the information again. The end user must only provide the information the first time around, i.e., when the information does not yet exist.

Even though this does not make the information inconsistent, you may want to prevent the deletion of the last row for a value of the columns on which the functional dependency exists. However, you only need to do this if the redundant information is still needed.

- If your database management system supports triggers, you may be able to use triggers to ensure consistency of the redundant information for update operations. The next visual illustrates how such a trigger must look like.

• Multiple copies of the same data are a second cause for redundant information. For performance reasons, you may have decided to:

- repeat columns in other tables - provide multiple copies of entire tables

If the information in the various tables must be consistent at all times, you can use triggers to enforce the consistency.

Frequently, if you have provided multiple copies of entire tables, one of the tables is the master table and must be up-to-date at all times. The copies are only used for reference purposes and need not be up-to-date at all times. In this case, you should disallow inserts, updates, or deletes for the copies. In addition, you may want to provide new versions (refreshes) of the copies periodically or from time to time.

• Redundancy can also be caused by stored data that can be derived from other stored data. Data that can be derived from other data is referred to as derivable data.

For our sample airline company, all seats for an aircraft have a row in table SEAT and the number of seats on the aircraft is the number of rows in the table. Thus, the number of seats can be derived from the information in table SEAT. To avoid scanning table SEAT every time you need the number of seats, you may prefer to store the number of seats in the rows for the aircraft in AIRCRAFT. If the seat arrangement for an aircraft changes and you forget to update the appropriate row in table AIRCRAFT, the derivable data becomes wrong.

If your database management system supports triggers, you can use triggers to maintain the correctness of derivable data. Whenever data affecting the derivable data are changed, a trigger must reevaluate and store the derivable data. The triggers



Student Notebook

achieving this for the number of seats for our sample airline company are illustrated on page 8-55.

An alternative is not to store derivable data and to derive them every time they are needed. Which way is better depends on the usage profiles of your business processes. Most of the time, retrieval operations are much more frequent than insert, update, or delete operations (80-20 rule) and are more performance-critical. Then, triggers are preferable.




Uempty

Figure 8-26. Violation of Normal Forms - Trigger CF182.0

Notes:

As we discussed, violations of the Second Normal Form or Third Normal Form lead to the redundant storage of data in the same table. Nonkey columns are solely dependent on columns that are not primary key columns (Third Normal Form) or only on some of the primary key columns (Second Normal Form). To ensure consistency, the dependent columns must have the same values for all rows having the same values for the columns on which the dependency exists.

The visual illustrates how a trigger maintaining the integrity of the redundant information for update operations should look. In the visual the dependent columns are called dependent-column-1, dependent-column-2, and so on.

Furthermore, to simplify matters, it is assumed that the columns are dependent on a single column and that the primary key for the table is not composite; otherwise, additional AND operators would be needed in the WHERE clause. The column the dependent columns are functionally dependent on is called reference-column on the visual.

The trigger is activated on update requests for the table violating the Normal Form. It is executed for each row updated if any of the dependent columns, i.e., the columns

Violation of Normal Forms - Trigger


REFERENCING NEW AS N OLD AS OFOR EACH ROW MODE DB2SQL

AFTER UPDATE OF dependent-column-1,

BEGIN ATOMICUPDATE table-name

END

SET dependent-column-1 = N.dependent-column-1,

WHERE reference-column = N.reference-column AND

. . . dependent-column-2,

WHEN ( N.dependent-column-1 <> O.dependent-column-1 ORN.dependent-column-2 <> O.dependent-column-2 OR. . .

)

dependent-column-2 = N.dependent-column-2,. . .

primary-key <> N.primary-key ;

ON table-name

SAME



Student Notebook

containing the redundant information, has been changed (UPDATE OF ...). However, the triggered actions are only performed if the value of at least one of the dependent columns has changed.

The triggered action changes the same table as the table being updated. It changes the values of the dependent columns of rows other than the row being updated (primary-key g N.primary-key) to the new values for the updated row (SET ... dependent-column-n = N.dependent-column-n). It only changes the columns of the rows having the same value, as the row being updated, for the column the dependent columns are functionally dependent on (reference-column = N.reference-column).

You can easily see the importance of the REFERENCING clause in this case because we need to refer to three different states for a column: the state before the row was updated, the state after the row has been updated, and the column for the rows being updated by the triggered action.

Because the triggered action updates the same columns for the same table, the trigger is invoked recursively. Thus, without the proper precautions, looping could occur. The search condition of the WHEN clause prevents an endless recursion because all dependent columns will have the same old and new value after some iterations.

Depending on how the iterations are performed by your database management system, you may experience a serious performance degradation when using the trigger!




Uempty

Figure 8-27. Derivable Data - Sample Triggers CF182.0

Notes:

The example on the visual illustrates how the integrity of stored derivable data can be maintained by means of triggers.

For our sample airline company, all seats for an aircraft have a row in table SEAT and the number of seats on the aircraft is the number of rows in the table. Thus, the number of seats can be derived from the information in table SEAT. To avoid scanning table SEAT each time the number of seats is needed, the number of seats for an aircraft is also kept in table AIRCRAFT. The appropriate column is Number_of_Seats and must be maintained as seats are added or deleted for an aircraft.

The first trigger (ADDSEAT) is activated each time a seat is added to table SEAT. For each seat added, it increases the number of seats in the row for the aircraft to which the seat belongs. This requires that column Number_of_Seats was initialized to zero when the row for the aircraft was inserted into table AIRCRAFT (default values).

Note that the row for an aircraft must exist before seats can be added to the aircraft. Also note that each row in table SEAT contains the serial number for the aircraft to which the seat belongs.

Derivable Data - Sample Triggers

In table AIRCRAFT, maintain number of seats on aircraft (Number_of_Seats)

CREATE TRIGGER ADDSEAT


AFTER INSERT ON SEAT

BEGIN ATOMICUPDATE AIRCRAFT

END

SET Number_of_Seats = Number_of_Seats + 1WHERE Aircraft_Number = N.Aircraft_Number;

CREATE TRIGGER DELSEAT

REFERENCING OLD AS OFOR EACH ROW MODE DB2SQL

AFTER DELETE ON SEAT

BEGIN ATOMICUPDATE AIRCRAFT

END

SET Number_of_Seats = Number_of_Seats - 1WHERE Aircraft_Number = O.Aircraft_Number;



Student Notebook

The second trigger (DELSEAT) is activated each time a seat is deleted from table SEAT. For each seat deleted for an aircraft, it decreases the number of seats in the row for the aircraft in table AIRCRAFT.

For both triggers, the REFERENCING clause is needed to be able to refer to the aircraft number for the seat added or deleted, respectively.




Uempty

Figure 8-28. Constraint Integrity CF182.0

Notes:

The data in the tables are also not correct if they violate business constraints (business rules) for the application domain. This may be a rule as simple as that an employee cannot be a pilot and a mechanic at the same time. It may also be a more complex rule such as that a mechanic can only be assigned to the maintenance of an aircraft if he/she has been trained for the appropriate aircraft model. Constraint integrity requires that the business constraints for the application domain are observed.

In Unit 3 - Problem Statement, business constraints were discussed as part of the problem statement for the application domain and the information to be provided for them was listed. In Unit 4 - Entity-Relationship Model, it was described how business constraints are represented in the entity-relationship model.

Some business constraints are expressed by basic modeling constructs in the entity-relationship model. For example, the controlling property is really a business constraint. In some cases, the controlling property translates into delete rule CASCADE for a referential constraint. However, for other cases, it does not. It does not if specified for an

Constraint Integrity

Ensures that business constraints are observed

Use triggers and user defined functions

Do not allow end users to maintain tables concerned by SQL DML statements

Provide proper front-ends to end users ensuring that business constraints are observed

OR

Sometimes, other items help such as unique indexes



Student Notebook

m:m relationship type as we have seen during our discussions about referential integrity. It then has to be handled in the same manner as other business constraints.

Of course, the business constraints must be translated into constraints for the tables of the application domain. During the discussions about referential constraints, we saw that the controlling property could be implemented by means of triggers in some cases. Triggers, possibly, in conjunction with user defined functions, are indeed in many cases the means for implementing business constraints within the database management system.

If your database management system does not support triggers or you do not want to use them, you can enforce business constraints by:

• Not allowing end users to maintain the tables concerned by directly using INSERT, UPDATE, or DELETE statements.

• Providing proper front-ends to the end users that ensure that the business constraints are observed.

Sometimes, other functions of the database management system may do the trick such as unique indexes. A unique index ensures that the set of columns for which it is defined contains every value only once. Indexes will be discussed in a later unit.

A unique index would solve the business constraint we modeled in Unit 4 - Entity-Relationship Model that, for a flight, each pilot function (CAPTAIN or COPILOT) must only be assigned once. The business constraint translated into the requirement that the combined values for attributes Flight Number, From, To, Flight Locator, and Pilot Function of entity type PILOT ASSIGNMENT must be unique. The resulting implementation is a unique index for the appropriate columns in table PILOT_ASSIGNMENT.




Uempty

Figure 8-29. Constraint Integrity - Example 1 CF182.0

Notes:

As we have seen in Unit 4 - Entity-Relationship Model, maintenance records for Come Aboard include the serial number of the aircraft the maintenance was performed for. The business constraints for Come Aboard state that maintenance records must be kept even if the aircraft is no longer owned by CAB. Even though the record for the aircraft no longer exists, the maintenance records must still contain the serial number for the aircraft they were established for. When a maintenance record is established, a record for the aircraft must exist.

Because the aircraft number of maintenance records may point to aircraft no longer owned by CAB, we could not model the interrelationship as a relationship type. Instead, we introduced a business constraint between entity types AIRCRAFT and MAINTENANCE RECORD: When an instance is added to entity type MAINTENANCE RECORD, entity type AIRCRAFT must contain an instance for the aircraft the maintenance record is established for.

As the entity types are converted into tuple types and into tables, the constraint must be translated into an equivalent constraint for the tables: When a row is added to table

Constraint Integrity - Example 1

{ 4 : New maintenance record only for existing aircraft }

{ 4 }

MAINTENANCE RECORD AIRCRAFT

AIRCRAFTMAINTENANCE_RECORD

{ 4 }

CREATE TRIGGER MRECORD


NO CASCADE BEFORE INSERT ON MAINTENANCE_RECORD

BEGIN ATOMICSIGNAL SQLSTATE '72002'

END('AIRCRAFT FOR MAINTENANCE RECORD DOES NOT EXIST');

WHEN ( NOT EXISTS

)

( SELECT Aircraft_Number

WHERE Aircraft_Number = N.Aircraft_Number )FROM AIRCRAFT



Student Notebook

MAINTENANCE_RECORD, table AIRCRAFT must contain a row for the aircraft the maintenance record is established for.

Because column Aircraft_Number of table MAINTENANCE_RECORD may contain aircraft numbers not contained in table AIRCRAFT, Aircraft_Number is not a foreign key of table MAINTENANCE_RECORD. Therefore, we cannot use referential constraints to ensure that the aircraft for new maintenance records exist.

However, we can use a trigger to ensure the existence of the aircraft for new maintenance records as illustrated by the bottom portion of the visual. Before a row is inserted into table MAINTENANCE_RECORD, the WHEN clause of the trigger checks if the aircraft number for the row exists in table AIRCRAFT. If it does not exist, a nonzero SQL state is raised causing the INSERT statement to terminate.




Uempty

Figure 8-30. Constraint Integrity - Example 2 (1 of 2) CF182.0

Notes:

In Unit 4- Entity-Relationship Model, we also modeled the business constraint that mechanics must only be scheduled for the maintenance of an aircraft if they have been trained for the aircraft model. The business constraint applies to relationship type MECHANIC_scheduled_for_AIRCRAFT. As input, it has relationship types AIRCRAFT MODEL_for_AIRCRAFT and MECHANIC_trained_for_AIRCRAFT MODEL.

When translated into a constraint for tables, it applies to table MECHANIC_FOR_AC which contains a row for each mechanic scheduled for the maintenance of an aircraft. Tables AIRCRAFT and MECHANIC_FOR_AM are input for the constraint. Note that table AIRCRAFT has, as foreign key, columns Type_Code and Model_Number specifying the aircraft model for the various aircraft. Therefore, table AIRCRAFT_MODEL is not needed as input for the business constraint.

Constraint Integrity - Example 2 (1 of 2)

{ 5 : Only trained mechanics for aircraft maintenance }

{ 5 } AND m

1. .1

_for_

mm

_trained_for_

m

m

_scheduled_for_

AIRCRAFT

AIRCRAFT MODELMECHANIC

{ 5 } AND

NA

3C7

C9

C10

C8

AIRCRAFT_MODELMECHANIC

AIRCRAFT

MECHANIC_FOR_AM

MECHANIC_FOR_AC



Student Notebook


Notes:

The visual illustrates the trigger for the business constraint discussed on the previous visual. The trigger ensures that mechanics are scheduled for the maintenance of an aircraft only if they have been trained for the appropriate aircraft model. The trigger achieves this as follows:

• In the WHEN clause, it joins tables MECHANIC_FOR_AM and AIRCRAFT on columns Type_Code and Model_Number. Each row of the intermediate result contains, for an aircraft, the employee number of an employee that has been trained for the aircraft model for the aircraft.

The WHERE clause extracts the rows for the aircraft number and the employee number of the row to be inserted into table MECHANIC_for_AC. The NOT EXISTS predicate determines if such rows were found. The result is true if rows were not found and false if rows were found, i.e., the mechanic has been trained for the aircraft model.

• If the WHEN clause is true, i.e., if the mechanic has not been trained for the aircraft model for the aircraft, the triggered action is performed.


CREATE TRIGGER MEFORAC


NO CASCADE BEFORE INSERT ON MECHANIC_FOR_AC

BEGIN ATOMICSIGNAL SQLSTATE '72003'

END('MECHANIC NOT TRAINED FOR AIRCRAFT MODEL');

WHEN

)

( NOT EXISTS

FROM MECHANIC_FOR_AM AS M JOIN AIRCRAFT AS AC ON AC.Type_Code = M.Type_Code AND

( SELECT Employee_Number

WHERE AC.Aircraft_Number = N.Aircraft_Number ANDAC.Model_Number = M.Model_Number

M.Employee_Number = N.Employee_Number )




Uempty
The triggered action signals a nonzero SQL state that causes the INSERT statement for table MECHANIC_FOR_AC to terminate. If the mechanic has been trained for the aircraft model, a zero SQL state is signaled and the row can be inserted.


Student Notebook


Notes:

Another business constraint for our sample airline company was that the number of engines mounted on an aircraft must not be larger than the number of engines for the aircraft type.

The left-hand portion of the visual repeats that, in the entity-relationship model, the business constraint is modeled as a constraint between entity types AIRCRAFT TYPE and AIRCRAFT. The aircraft type must be the one for the aircraft whose number of engines is matched against the number of engines that can be mounted. Therefore, in principle, also relationship types AIRCRAFT MODEL_for_AIRCRAFT and AIRCRAFT TYPE_for_AIRCRAFT MODEL are input for the constraint. This could have been indicated by dashed lines in the entity-relationship model. However, since it is self-evident and to avoid cluttering the entity-relationship model, it has not been shown in the entity-relationship model.

When translating the business constraint into a constraint for the tables of the application domain, it becomes a constraint between tables AIRCRAFT_TYPE and ENGINE. Again, a similar remark applies: To come to the proper aircraft type for the aircraft on which an


{ 1 } AND

NA

3

NA

2

SN6

AIRCRAFT_MODEL

AIRCRAFT

AIRCRAFT_TYPE

ENGINE

{ 1 : Number of engines for aircraft Number of engines for aircraft type }

_<

{ 1 }

m

1. .1

_for_

D1. .m

_for_

1. .1

DC_in__on_

m

1

AIRCRAFT

AIRCRAFT TYPE

AIRCRAFT MODEL

ENGINE

ENGINE LOCATION




Uempty
engine is to be mounted, you must navigate, from ENGINE, via the referential constraints to table AIRCRAFT_TYPE. Since entity type AIRCRAFT MODEL is a dependent entity type of AIRCRAFT TYPE, you can directly go from table AIRCRAFT to table AIRCRAFT_TYPE.
To illustrate this, we have also shown table AIRCRAFT as input for the constraint in the referential structure on the right-hand side of the visual.



Student Notebook


Notes:

The implementation of the business constraint that the number of engines mounted on an aircraft must not exceed the number of engines for the aircraft type requires two triggers: a trigger controlling insert operations and a trigger controlling update operations. The visual illustrates the trigger for the insert operations:

• The trigger is activated each time a row is added to table ENGINE. It is activated before the row is inserted and checks if the new engine violates the constraint.

• The appropriate check is made in the WHEN clause.

The first SELECT statement counts the number of engines for the aircraft for the engine being added.

The second SELECT statement joins tables AIRCRAFT and AIRCRAFT_TYPE on column Type_Code. The intermediate result contains, for each aircraft, the number of engines for its aircraft type.

The SELECT statement further extracts the number of engines for the aircraft type of the aircraft to which the new engine is to be added.


CREATE TRIGGER ADDENGIN


NO CASCADE BEFORE INSERT ON ENGINE

BEGIN ATOMICSIGNAL SQLSTATE '72004' ('TOO MANY ENGINES FOR AIRCRAFT');

END

WHEN (

>=

)

FROM AIRCRAFT_TYPE AS AT JOIN AIRCRAFT AS AC ON AC.Type_Code = AT.Type_Code

( SELECT Number_of_Engines

WHERE AC.Aircraft_Number = N.Aircraft_Number )

ENGINEFROM COUNT(*)( SELECT

Aircraft_Number = N.Aircraft_Number )WHERE




Uempty
The results of the two SELECT statements are compared with each other.
• The triggered action is only performed if the WHEN condition evaluates to true, i.e., if the number of engines for the aircraft becomes larger that allowed for the type. In this case, a nonzero SQL state is signaled which causes the INSERT statement to terminate.

If the WHEN condition evaluated to false or unknown, the triggered action is not performed and the new row can be added to table ENGINE.

The trigger for update operations looks the same except that the name for the trigger must be different and the second line must read:

NO CASCADE BEFORE UPDATE ON ENGINE

Because column Aircraft_Number of table ENGINE can also be changed by the delete rule of the referential constraint between tables AIRCRAFT and ENGINE, you should not specify UPDATE OF Aircraft_Number ON ENGINE.

A further note of caution: Because of current restrictions, the trigger may not work on all database management systems.



Student Notebook

Checkpoint


1. Name the four basic types of integrity that must be maintained for a database.

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

2. What is a foreign key?

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

3. The order and meaning of the parent-key and foreign-key columns must be the same, not their names. (T/F)

4. Match the following terms with the proper definitions:

a. Parent table

b. Dependent table

c. Self-referencing table

d. Referential constraint

e. Referential integrity for a referential constraint

____ All foreign key values have a matching parent key value.

____ The table containing the foreign key.

____ A correlation between a parent key and a foreign key.

____ The table containing the parent key.

____ A table containing both the parent key and the foreign key for a referential constraint.




Uempty
5. Describe the difference between delete rules NO ACTION and RESTRICT.
_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

6. Delete rule CASCADE causes the deletion of dependent rows. (T/F)

7. Which are the update rules for referential constraints supported by most database management systems?

a. NO ACTION.

b. RESTRICT.

c. SET NULL.

d. CASCADE.

8. Assume that the delete rule for a referential constraint is CASCADE. Describe a case for which the deletion of a parent row would still fail.

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

9. Assume that the controlling property has been specified for the source of an m:m relationship type. How can you ensure that the row for a source instance is deleted if the row for an affiliated relationship instance is deleted?

_____________________________________________________

_____________________________________________________

_____________________________________________________



Student Notebook

10.When is table T delete-connected to table T1?

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

11. If T is delete-connected to T1 via multiple referential paths with different referential constraints for T, then the delete rules for the referential constraints involving T must be the same and must be CASCADE. (T/F)

12.A self-referencing constraint is a referential cycle. (T/F)

13.A self-referencing table is delete-connected to itself. (T/F)

14.What are the restrictions for referential cycles?

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

15.Referential constraints must be defined for the parent table. (T/F)

16.What is the purpose of a referential structure?

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

17.The arrow representing a referential constraint in a referential structure points from the parent table to the dependent table. (T/F)




Uempty
18.What is the meaning of a double-headed arrow in a referential structure?
_____________________________________________________

_____________________________________________________

_____________________________________________________

19.What does domain integrity mean for your tables?

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

20.Name two causes for the redundancy of data.

_____________________________________________________

_____________________________________________________

_____________________________________________________

21.The updating of redundant information for violations of the Third Normal Form cannot be controlled by triggers. (T/F)

22.How can you ensure that derivable data are always correct?

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

23.What is required to achieve constraint integrity?

_____________________________________________________

_____________________________________________________

_____________________________________________________



Student Notebook

24.What are the main ingredients for achieving constraint integrity?

_____________________________________________________

_____________________________________________________

_____________________________________________________




Uempty


Notes:

Unit Summary

There exist restrictions for delete-connected tables and referential cycles

Referential integrity requires that all foreign key values have matching parent key values

For achieving redundancy or constraint integrity, triggers can be used (if necessary, in conjunction with user defined function)

Domain integrity requires the correctness of the values of the columns for the tabIes of the application domain

Redundancy integrity requires the consistency of redundant information

A referential structure provides an overview of the referential constraints for the application domain or a subset thereof

Constraint integrity requires the observance of the business constraints for the application domain

The delete rules for referential constraints are NO ACTION, RESTRICT, SET NULL, and CASCADE

The update rules for referential constraints supported by most systems are NO ACTION and RESTRICT



Student Notebook




Uempty
Unit 9. Indexes

This unit describes the structure and purpose of indexes and discusses for which columns of the tables indexes should be established from a database design perspective.



• Describe the basic structure of indexes.

• Explain the various options for indexes.

• Describe for which columns you should establish indexes.


Accountability:



© Copyright IBM Corp. 2000, 2002 Unit 9. Indexes 9-1

Student Notebook


Notes:

Conceptually, the tables established can be implemented without indexes. However, in this case, accessing a row may mean searching the entire table for the row and may be very time-consuming and expensive. Indexes present a means for directly accessing specific rows and are needed to ensure performance.

In this unit, we will describe the basic structure of indexes and demonstrate how they are used for directly accessing a row. Furthermore, we will talk about various options for (forms of) indexes such as unique or nonunique indexes.

In addition, we will discuss for which columns, from a database design perspective, you should establish indexes. We will not talk about the usage of indexes from the business-process perspective. The requirements of the business processes for indexes depend on their usage patterns and may change in the course of time. Therefore, indexes for business processes should be established as and when needed and dropped when they are no longer needed.

The database management systems generally provide means for analyzing queries to determine the need for and effectiveness of indexes.

Unit Objectives

Explain the various options for indexes

Describe for which columns you should establish indexes


Describe the basic structure of indexes




Uempty
9.1 Structure, Options, and Usage of Indexes


Student Notebook

Figure 9-2. Indexes in Design Process CF182.0

Notes:

This unit deals with the establishment of indexes for the tables of the application domain. Therefore, it follows the establishment of the tables. Because the referential integrity support of most database management systems requires indexes, the establishment of indexes even follows the establishment of the integrity rules. It is the last step of storage view.

Indexes in Design Process

ConceptualView


Integrity Rules

Indexes

Tables

Tuple Types



Problem Statement





Uempty

Figure 9-3. Purpose of an Index CF182.0

Notes:

The main purpose of an index is to improve performance in cases in which, otherwise, the rows of the table would have to be scanned for locating a row. The visual illustrates this for table AIRCRAFT_MODEL for our sample airline company called Come Aboard. Without an index, when searching for an aircraft model, the data pages with the rows of the table must be retrieved and scanned until the model has been found.

If the row is not contained in the table or multiple rows may exist for the same search criterion, all rows for the table must be inspected. As you can see, this may require a lot of pages (blocks) to be read and, as a consequence, a lot of I/O operations and may be very expensive. The situation can be remedied by an index.

Indexes allow the database management system to directly access individual rows rather than having to scan the rows of the table.

As we will see on the next visual, indexes logically order the rows of the table according to the columns to which they apply. Per se, they do not order the rows physically even though they may be used to ensure that the physical order corresponds to the logical order as closely as possible.

Purpose of an Index

To improve performance in cases in which the locating of a row would require the scanning of the rows

Data Pages AIRCRAFT_MODEL

A340 200 A300 600 B737 300 A320 200 B737 600 B777 200 B747 400

B737 600Searching for

Indexes provide for logical sequential processing rather than physical sequential processing

Starting/ending with a selected row (e.g., BETWEEN)

Indexes can avoid internal sorting

Indexes allow direct access to rows



Student Notebook

The logical order of an index allows, without sorting, the rows of the table to be processed in that logical order rather than in their physical order. Combining the logical order with the direct-access capability, an index allows you to start logical sequential processing at a specific row and/or end it with a specific row. In particular, this supports the BETWEEN predicate for SQL queries.

As already indicated, by using the logical ordering of an index, the database management system may be able to avoid internal sorting of the rows retrieved. In particular, this may be the case for SELECT statements using ORDER BY, GROUP BY, or DISTINCT.




Uempty

Figure 9-4. Structure of Indexes CF182.0

Notes:

An index is based on a key, i.e., an ordered set of columns of a table. It is a multilevel tree structure logically ordering the rows of a table in accordance with the key for the index. The order can be ascending or descending depending on what you have requested. You can determine the order when defining the index.

Assuming that the physical order of the rows may be different from the logical order implied by the key, the index must be a dense index. This means that all key values must be reflected by index entries in the lowest index level.

On the bottom of the visual, you see data pages with sample rows for table AIRCRAFT_MODEL. On the lowest index level, there must be an index entry for each row. The index entries are generally grouped into index pages. Within an index page, the index entries are sorted in the requested order in accordance with the key for the index. The key ranges for the index pages do not overlap.

Each index entry contains a key value and a pointer to the appropriate row(s) as indicated on the visual. Thus, all rows of the table must be pointed to by index entries (dense indexes).

B737 300 B777 200 X'FF...FF'

Structure of Indexes

Index Key = (Type_Code,

Model_Number)


A340 200 A300 600 B737 300 A320 200 B737 600 B777 200 B747 400

A300 600 A320 200 A340 200 B737 300 B737 600 B747 400 B777 200

Root Page

Leaf Pages



Student Notebook

The pages of the lowest index level are referred to as leaf pages. In general, the leaf pages are chained together, forward and backward, in the ordering sequence emphasizing that the index logically orders the rows of the table. The chaining of the leaf pages is used for the logical sequential processing of rows.

Since an index may consist of many leaf pages (even more than data pages), it would still be very inefficient to search the leaf pages for a particular row. Therefore, higher index levels are introduced, again consisting of index pages referred to as nonleaf pages. The index entries of the second index level (the one above the leaf pages) order the leaf pages.

Each index entry contains a key value and a pointer to a leaf page. Assuming an ascending key sequence, the key value must be a key value higher than or equal to the highest key value of the leaf page and lower than or equal to the lowest key value of the logically next leaf page. On the visual, the lowest key value of the logically next leaf page is used as DB2 does. This has an advantage when inserting rows as we will see later.

The last index entry on any higher index level has a key value of all bytes hexadecimal FF, the highest key value possible.

Since the second index level orders pages rather than rows, it will generally contain only a few pages. If it contains more than one nonleaf page, a third index level is introduced to order the pages of the second index level, and so on. The tree structure stops with an index level that has only one index page.

The index page of the highest index level is referred to as root page.

The indexes established this way are balanced trees meaning that the number of index levels to be traversed from the root page to a row is the same for all rows. There are other types of indexes possible, but balanced trees have proven to be the best especially if the distribution of the key values is random and cannot be predicted in advance.

Most indexes have two or three levels.




Uempty

Figure 9-5. Searching Via an Index CF182.0

Notes:

Using the index illustrated on this visual and the previous visual, aircraft model Boeing B737, Model 600, is searched for. The index is an index in ascending order.

First, the root page of the index is searched for the proper index entry. The proper index entry is the first index entry whose key value is higher than the given key value. This is the search rule for all index levels above the leaf-page level.

In our example, the proper index entry is the second index entry of the root page, i.e., the one with key value (B777, 200). The entry points to the second leaf page.

When searching leaf pages, you look for the last index entry whose key value is lower than or equal to the given key value. If you find an index entry with the given key value, the desired row exists and is pointed to by the index entry. If the key value of the index entry is lower, the desired row does not exist.

In our case, the entry found is the second index entry of the second leaf page and has the key value searched for. Thus, the row exist and, indeed, the index entry points to the row.

Searching Via an Index


B737 600

Searching for

B737 300 B777 200 X'FF...FF'

A300 600 A320 200 A340 200 B737 300 B737 600 B747 400 B777 200

A340 200 A300 600 B737 300 A320 200 B737 600 B777 200 B747 400


Model_Number)



Student Notebook

Consider a table with 20,000 rows of 100 characters each and assume that the key for the index is 8 characters long. Furthermore, assume that the size of the data pages and of the index pages is 4K.

Under these assumptions, the rows occupy approximately 500 pages and the index consists of only two index levels. Reading a row using the index requires three pages to be accessed. In contrast, scanning the rows would require on the average 250 pages to be accessed assuming the system stops scanning when it has found the row (which is generally not the case). This illustrates very clearly the advantage of having an index.




Uempty

Figure 9-6. Unique and Nonunique Indexes CF182.0

Notes:

Unique indexes come in two flavors:

• Plain unique indexes consider the NULL value as a regular value and require/enforce that each key value occurs in at most one row.

For an index key consisting of one column, this means that the NULL value may occur in at most one row. Thus, uniqueness is enforced for all values including the NULL value.

For an index key consisting of two columns, two key values (a,b) and (c,d) are considered equal if a=c and b=d. This includes the NULL value: (a,NULL) and (c,NULL) are considered different if a and c are different and identical if they are the same.

In particular, (plain) unique indexes can be used for the following two purposes:

- They can be used to guarantee the uniqueness of the values of the primary key for a table.

- They can be used to guarantee the uniqueness of the values of a foreign key resulting from the merging of the tuple type for a 1:1 relationship type.

Unique and Nonunique Indexes

Every value of the key excluding the NULL value may occur at most once; NULL value may occur multiple times

For foreign keys for imbedded 1:1 relationship types

Unique-Where-Not-NULL Index

Every value of the key may occur multiple times

For foreign keys for merged or imbedded 1:m relationship types

For individual columns of composite keys

Nonunique Index

Every value of the key including the NULL value may occur at most once

Can be used to guarantee uniqueness of values for primary key of table

For foreign keys for merged 1:1 relationship types

( Plain ) Unique Index



Student Notebook

Because the relationship type is a 1:1 relationship type, each defining attribute could be the relationship key. Therefore, the corresponding (composite) attributes of the related tuple type can assume each value only once. Thus, also the attribute that has not been made the primary key of the tuple type.

For tuple types to be merged, they must, at all times, have the same primary key values and, thus, number of tuples. As a consequence, the foreign key resulting from the defining attribute that has not been made the primary key can assume each value only once. Therefore, a (plain) unique index can be used to ensure the uniqueness of the foreign key values.

• Unique-where-not-NULL indexes treat each occurrence of the NULL value as different and require/enforce that each key value occurs in at most one row.

For an index consisting of one column, this means that each value except the NULL value must occur in at most one row. Thus, uniqueness is only enforced for those values that are not NULL.

For a key consisting of two columns, each occurrence of (a,NULL) is considered different. Thus, uniqueness is only enforced for those key values for which none of their components is the NULL value.

Unique-where-not-NULL indexes can be used to guarantee the uniqueness of the values (that are not NULL) of foreign keys resulting from the imbedding of tuple types for 1:1 relationship types.

The rationale is similar to the one for merged tuple types of 1:1 relationship types. However, the imbedded tuple type may have, at any point in time, fewer tuples than the target tuple type. As a consequence, for some of the rows, the foreign key may not have a value requiring a unique-where-not-NULL index rather than a plain unique index.

Nonunique indexes allow any value of the key to occur in any number of rows.

In particular, nonunique indexes can be used for:

• The foreign keys resulting from merged or imbedded 1:m relationship types.

• Individual columns of composite keys. Even though the values of the composite key may have to be unique, the values of the individual columns need not be unique.




Uempty

Figure 9-7. Clustering Index CF182.0

Notes:

Indexes marked as clustering indexes are used by the database management system to control where new rows are inserted. They are used to determine the insertion point for the new rows.

By using an index to determine the insertion point, the database management system attempts to make the physical sequence of the data pages equal to the logical sequence implied by the index. However, a new row is inserted at the point determined via the index only if the located page contains enough free space for the row. Therefore, when defining the space object for the table, you should request that free space is left in the data pages for later insertions when the rows are loaded.

As mentioned before, the database management system attempts to insert the new row in the page determined by means of the clustering index. If the data page does not contain enough space for the row, the row is not inserted into the page. The data page is not split either. Instead, the database management system inserts the new row into the closest page with sufficient free space in the neighborhood of the ideal insertion point. If none of

Clustering Index

Only one clustering index for a table

Insertion point for new row determined via index

Controls where new rows are inserted

Attempts to make physical sequence of data pages equal to logical sequence imposed by key for index

Should request free space during definition of corresponding space object

Only if free space available at insertion point

Row inserted elsewhere if space not available at insertion point

Supported only by some database management systems

For example, DB2 Universal Database for OS/390



Student Notebook

the pages in the neighborhood has enough free space, the row is inserted somewhere else.

Since the logical order of the index determines the physical order of the data pages and the rows can only be ordered according to one criterion, there can only be one clustering index for a table.

A clustering index is advantageous if you have business processes processing the rows in the logical order imposed by the index. Since the data pages are pretty much in the logical order of the key, the database management system need not jump permanently from one place on the storage volume to another. It can efficiently use techniques such as sequential prefetch to read a set of physically adjacent data pages with a single I/O operation.

Clustering indexes are only supported by a few database management systems. They are supported, for example, by DB2 Universal Database for z/OS.




Uempty

Figure 9-8. Clustering Index - First Insertion (1 of 2) CF182.0

Notes:

The visual illustrates how the insertion point for a new row of table AIRCRAFT_MODEL is determined using a clustering index. The aircraft model to be inserted has type code B757 and model number 300.

The index is searched in precisely the same manner as for the retrieval of rows: The second index entry of the root page is the first index entry with a higher key value than the new row. Therefore, it is the one that points to the proper leaf page, the second leaf page on the visual.

In the leaf page, you look for the last index entry with a key value lower than or equal to the key value of the new row. Since the leaf page contains a single index entry, it is the entry found and its key is lower. As a consequence, the new row will be inserted into the data page pointed to by the index entry provided the data page has sufficient free space.

For the example, the new row is to be inserted into the third data page. It contains enough free space. The new row will follow the row with key (B747, 400). This is indeed the proper place for maintaining the data pages in the logical order implied by the index. The next visual shows the data pages and the index after the insertion of the row.

Clustering Index - First Insertion (1 of 2)


B757 300

Inserting

B747 400 B777 200 X'FF...FF'

A300 600 A320 200 A340 200 B747 400 B777 200

Insertion Point

A300 600 A320 200 A340 200 B747 400 B777 200


Model_Number)



Student Notebook

Figure 9-9. Clustering Index - First Insertion (2 of 2) CF182.0

Notes:

Since the third data page has enough free space for the new row, the row with key (B757, 300) is inserted into this data page. An appropriate index entry is added to the second leaf page.

Clustering Index - First Insertion (2 of 2)


B747 400 B777 200 X'FF...FF'

A300 600 A320 200 A340 200 B777 200B747 400 B757 300

A300 600 A320 200 A340 200 B777 200B747 400 300B757


Model_Number)




Uempty

Figure 9-10. Clustering Index - Second Insertion (1 of 2) CF182.0

Notes:

The visual illustrates the locating of the insertion point for a second insertion: A new row with key (B767, 200) is to be inserted.

This time, the second index entry of the second leaf page is the last index entry whose key value is lower than or equal to the key value of the row to be inserted. The data page pointed to by the index entry is the third data page, the same as for the previous insert request.

Since the third data page does not have any free space, the new row cannot be inserted into the data page. The system looks for the closest data page with enough free space. The second data page and the fourth data page have enough free space and are equally close. Since the index is in ascending order, later pages in logical order are preferred and the fourth data page is chosen. The new row will be inserted into the free space of the fourth data page, i.e., following the row with key (B777, 200).

The next visual illustrated the insertion of the row.

Clustering Index - Second Insertion (1 of 2)


B747 400 B777 200 X'FF...FF'B767 200

Inserting

A300 600 A320 200 A340 200 B777 200B747 400 B757 300

A300 600 A320 200 A340 200 B777 200B747 400 300B757


Model_Number)

Insertion Point



Student Notebook

Figure 9-11. Clustering Index - Second Insertion (2 of 2) CF182.0

Notes:

Since the third data page does not have enough free space for the new row, the row with key (B767, 200) is inserted into the fourth data page. An appropriate index entry is added to the second leaf page.

As you can see, the physical order of the rows no longer coincides with the logical order.

Clustering Index - Second Insertion (2 of 2)

B747 400 B777 200 X'FF...FF'


A300 600 A320 200 A340 200 B777 200B767 200B747 400 B757 300

A300 600 A320 200 A340 200 B747 400 300B757 B777 200 B767 200


Model_Number)




Uempty

Figure 9-12. Partitioning Index CF182.0

Notes:

Partitioning indexes are a special form of clustering indexes. Thus, they have all the features of clustering indexes. In addition, you must define key ranges for the key values of the index. In turn, the key ranges for the index subdivide the rows for the table into corresponding key ranges referred to as partitions.

Assuming that a partitioning index has been defined on column Employee_Number of table EMPLOYEE for Come Aboard, the employees are partitioned in accordance with the key ranges for the index. The example on the visual illustrates a partitioning into three partitions: The first partition contains all rows for employees with an employee number smaller than or equal to 1350000; the second partition all rows for employees with employee numbers larger than 1350000, but not larger than 2999999; and the third partition the rows for all remaining employees.

The partitioning can, however, only have an effect if something more is connected to it. Generally, the following functions come with the subdivision into partitions:

• The rows of the partitions are placed into different physical spaces which may reside on different cylinders or even different volumes. The rows for a partition are always placed

Partitioning Index

A special clustering index

Rows always inserted in physical space for partition

Partitions in different physical spaces

Subdivides rows for table into key ranges

Key ranges referred to as partitionsFor example:

3rd partition: All employees with Employee_Number > 2999999

1st partition: All employees with Employee_Number 1350000<

2nd partition: All employees with Employee_Number > 1350000 and Employee_Number 2999999<

Partitions can be processed separately and in parallel by utilities

SQL operations can access partitions in parallel reducing run times



Student Notebook

into the physical space for that partition and never into the physical space for another partition.

• Utilities as, for example, a load utility, a backup utility, a recovery utility, or a reorganization utility, can process individual partitions; can process the partitions separately and jointly; and can process the partitions in parallel.

• SQL operations can process the partitions in parallel using multiple tasks or processes of the operating system. This can considerably reduce the run time for SQL operations, especially, queries.

These points imply that partitioning indexes are worth considering if you have large tables. However, it is a prerequisite that you can reasonably subdivide the rows of the table into partitions.

Since clustering indexes are only supported by a few database management systems, partitioning indexes are also only supported by a few database management systems. For example, they are supported by DB2 Universal Database for z/OS. Other systems use other means to partition the rows of tables.




Uempty

Figure 9-13. Use of Indexes CF182.0

Notes:

From a database design perspective, the following rules apply for the use of indexes:

• For each primary key, define a (plain) unique index independent of the number of data pages the table occupies. By allowing each primary key value to occur only once, the index ensures the unique identification of the rows. Without the index, each primary key value could occur more than once.

If you are using the referential integrity support of your system for a referential constraint, most systems require a unique index for the primary key. The index (and only the index) is used to check if the parent row exists for a row inserted into the dependent table.

• For each foreign key, define an index if the rows of the table occupy more than three data pages.

When using the referential integrity support of your database management system, the system will generally not force you to have an index for the foreign key. If an index exists for the foreign key, it is used to ensure the referential integrity when deleting rows of the parent table.

Use of Indexes

Because of maintenance overhead, create additional indexes only if really required

Good candidates for indexes are columns used (frequently) for Joins, ORDER BY, GROUP BY, DISTINCT, or direct access of rows

If the table does not change after loading, you can create any indexes

For each primary key, define a plain unique index

Ensures unique identification of the rows

Need not have a second index if other index exists whose key contains:

Key columns for second index as leading key columnsKey columns for second index in same sequence

For each foreign key, define an index

If index exists, it is used by referential integrity support for ensuring referential integrity when deleting a row of the parent table

Can cause poor performance if index is missing



Student Notebook

For delete rules NO ACTION or RESTRICT, the index (and only the index) is used to check if the dependent table has dependent rows. For delete rule SET NULL, it is used to determine the dependent rows whose foreign key values must be reset to NULL. For delete rule CASCADE, it is used to determine the rows of the dependent table that must be deleted.

The missing index for the foreign key of a referential constraint is often the reason for complaints about the poor performance of the referential integrity support.

• If you have an index for a composite key, you do not need an additional index for leading columns of the key if you should need such an index. This assumes that the columns are in the required order. The system is generally able to use the index for the composite key since it is, in particular, ordered in accordance with the required leading key columns.

• The maintenance of indexes due to insert, update, or delete operations cannot be neglected. Therefore, you should introduce additional indexes only if they are really required by the business processes and not as a precautionary measure. Of course, you can add any indexes, as long as you do not mind the space they occupy, if the rows of the table do not change after the loading of the table.

• Good candidates for indexes are columns that are used for Join operations, the SQL ORDER BY, GROUP BY, or DISTINCT clauses/keywords, or for the direct access of rows.

Most systems have tools that allow you do determine the effectiveness of an index.




Uempty

Figure 9-14. No Index for Leading Foreign Key CF182.0

Notes:

For m:m relationship types, you need a table containing columns for the defining attributes for the relationship type. As you know, the defining attributes together form the relationship key and, therefore, the primary key of the table. Thus, you should have a unique index comprising all columns for the defining attributes.

For each of the defining attributes, the key consisting of the columns for the defining attribute represents a foreign key. One of these keys is the first part of the primary key. Thus, its columns are the leading columns of the primary key. Therefore, you do not need an additional index for that key. However, you should have an index for the other foreign key because the primary (key) index is ordered differently.

The visual illustrates this for table MECHANIC_FOR_AC, the table for m:m relationship type MECHANIC_scheduled_for_AIRCRAFT. The table consists of the primary key columns for tables MECHANIC and AIRCRAFT, i.e., columns Employee_Number and Aircraft_Number. Together, the columns form the primary key. Let us assume that Employee_Number is the first column of the primary key and that there is a unique index for the primary key.

No Index for Leading Foreign Key

MECHANIC

Employee_Number

AIRCRAFT

Aircraft_Number

Employee_Number Aircraft_Number

MECHANIC_FOR_AC

CC Index on primary key(Employee_Number, Aircraft_Number)

No index required for foreign keyEmployee_Number



Student Notebook

Individually, columns Employee_Number and Aircraft_Number are foreign keys as indicated by the referential constraints on the visual. Since Employee_Number is the first column of the primary key, you do not need an index for it. The primary (key) index is used instead. However, you should have a nonunique index for column Aircraft_Number.




Uempty

Figure 9-15. Indexes - Documentation CF182.0

Notes:

For each index, you should provide the following information:

• The name of the table for which the index is established.

• You should select a name for each index in agreement with the naming rules for indexes for your database management system. The name for the index should be unique for the application domain and must not be the name of a table.

The name is only used to identify the index to the database management system. It is not needed by end-users or for any other objects being defined. Therefore, you could omit it and leave it to the database administrator to select a name when he/she defines the index.

• The ordered list of columns making up the key for the index.

• The properties the index should have: If it should be a (plain) unique index, a unique-where-not-NULL index, or a nonunique index; if it should be a clustering index or a partitioning index or not.

• For a partitioning index, the key ranges for the partitions of the table.

For each index:

Table for Index:

Name of table to which index applies

Name for index. Should be unique for application domainIndex Name:

For a partitioning index, key ranges for the partitionsKey Ranges:

Properties: UNIQUE, UNIQUE WHERE NOT NULL, NONUNIQUE, CLUSTERING, PARTITIONING

Index Key: Ordered list of columns over which index is defined

Indexes - Documentation



Student Notebook

Checkpoint


1. What is the main purpose of an index?

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

2. Indexes allow the database management system to directly access the rows of the table on which the index has been defined. (T/F)

3. Indexes may help the database management system to avoid the internal sorting of rows. (T/F)

4. What does it mean that an index is a dense index?

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

5. The root page is always a nonleaf page. (T/F)

6. Assume that you have defined a plain unique index on a column of a table. For how many rows of the table can the column contain the NULL value?

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________




Uempty
7. For an index whose key consists of one column, match the following definitions with the corresponding type of index:
8. Describe two cases for the usage of plain unique indexes.

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

9. Describe a case for the usage of a unique-where-not-NULL index.

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

10.What does the system attempt to do if you have a clustering index for a table?

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

a. Every value including the NULL value must occur in at most one row.

____ Nonunique index

b. Every value excluding the NULL value must occur in at most one row.

____ Plain unique index

c. Every value can occur in any number of rows.

____ Unique-where-not-NULL index



Student Notebook

11. Which of the following actions are taken if the data page for a new row determined by a clustering index does not have enough free space for the row?

a. The row is not inserted.

b. The data page is split in half and the row is inserted into one of the new data pages.

c. If there is a data page in the neighborhood that has enough free space, the row is inserted into that data page; otherwise, it is inserted somewhere else.

d. The row is always inserted at the end of the data pages.

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

12.From the database design perspective, for which columns should you establish an index?

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________

_____________________________________________________




Uempty


Notes:

Unit Summary

Clustering indexes attempt to store the rows into the data pages in such a way that the physical sequence agrees with the implied logical sequence

Plain unique indexes ensure that each value of the index key including the NULL value occurs in at most one row

Nonunique indexes allow any value to occur in any number of rows

Unique-where-not-NULL indexes ensure that each value of the index key excluding the NULL value occurs in at most one row

You should establish an index for each foreign key

Establish a (plain) unique index for each primary key

Indexes allow the database management system to directly access the rows of a table

Indexes support the logical sequential processing of rows without sorting

Indexes help avoid internal sorts by the database management system



Student Notebook




Uempty
Unit 10. Logical Data Structures

This unit makes the transition from storage view to logical view. It describes logical data structures and briefly discusses views which complement the logical data structures. Logical data structures are established for business processes and illustrate in which tables the data for a business process are located and also show the process-specific flow through the tables of the application domain.



• Explain the purpose of logical data structures.

• Understand who has the responsibility for the establishment of logical data structures.

• Describe the components of logical data structures and their representation.

• Explain the relationship between business processes and logical data structures.

• Describe the interrelationship between logical data structures and views.


Accountability:



© Copyright IBM Corp. 20 00, 2002 Unit 10. Logica l Data Structures 10-1

Student Notebook


Notes:

After having established the tables, integrity rules, and indexes for the application domain, we must make the transition from storage view to logical view. The transition verifies the design of the database and proves that it meets the requirements of the business processes. The verification is accomplished by establishing the logical data structures for the business processes described in the process inventory.

This unit describes logical data structures and briefly discusses views which complement them. It describes:

• The purpose of logical data structures.

• Who is responsible for establishing the logical data structures for the business processes and the role of the database designer.

• The components of logical data structures and how they are represented.

• The relationship between the business processes and logical data structures.

• The interrelationship between logical data structures and views. Views are relational database objects describing subsets and combinations of one or more tables.

Unit Objectives

Understand who has the responsibility for the establishment of logical data structures


Explain the purpose of logical data structures

Describe the components of logical data structures and their representation

Describe the interrelationship between logical data structures and views

Explain the relationship between business processes and logical data structures




Uempty
10.1 Logical Data Structures


Student Notebook

Figure 10-2. Logical Data Structures in Design Process CF182.0

Notes:

After the tables, integrity rules, and indexes for the application domain have been determined, it is time to verify that the database design meets the requirements of the business processes. As part of logical view, the logical data structures are established for all business processes described in the process inventory.

The logical view looks at the data of the application domain from the perspective of the business processes for the application domain. Accordingly, the logical data structures show which tables of the application domain contain the data needed by the business processes. They also describe how to navigate from table to table when accessing the data.

The tables established for the application domain are the primary input for the establishment of the logical data structure. The integrity rules, more precisely, the referential constraints between primary keys and foreign keys, are a second input because they show the natural paths between the various tables.

Logical Data Structures in Design Process

ConceptualView



Integrity Rules

Indexes

Tables

Tuple Types


Problem Statement





Uempty

Figure 10-3. Logical Data Structures - Purpose CF182.0

Notes:

As the last step of the conceptual view, the data needed by the business processes of the process inventory were described as data elements and data groups in the data inventory.

Based on the data elements and data groups in the data inventory, the tables for the application design were developed. The data elements became the columns of the tables. The data groups only provided structural information needed for normalization and the splitting of tuple types.

In general, the data elements for a single business process constitute a small subset of the columns of the tables and may be located in different tables. Therefore, for the individual business processes, it is necessary to identify:

• The columns and tables corresponding to the data elements used by the business process.

• How the business process can find, using the data found in one table, related data in other tables.

Logical Data Structures - Purpose

Data Inventory

Data Elements andData Groups

Process

Tables

Data elements used by a process form a subset of the tables


Locigal data structures identify tables and columns needed by the process

Logical data structures identify how to navigate from table to table

Logical data structures describe logical view and data flow of process



Student Notebook

This is where the logical data structures come into play. The logical data structures for a business process describe this. They describe the subset of tables and columns needed by the business process or a part of it. They also illustrate how the business process or the appropriate part must navigate logically through the tables to achieve its function. Thus, they reflect the logical view the business process (or the part) has of the tables and the data flow between the tables for the business process.




Uempty

Figure 10-4. Logical Data Structures - Responsibilities CF182.0

Notes:

The logical data structures describe the logical views the business processes have of the tables and the flow of data between the tables for the processes. They must show the application programmers which tables contain the data (columns) for the business processes and how to navigate from one table to the next. Thus, when establishing the logical data structures, the interfaces between the business processes and the tables are exposed.

The development of the logical data structures is a joint effort between the database designer and the application programmers. The database designer must participate in the development because he/she knows the tables, their columns, and the referential constraints between the tables. The referential constraints, representing relationships between primary keys and foreign keys, provide natural paths between the tables. They are the primary vehicles for interconnecting the various tables.

However, the database designer cannot establish the logical data structures on his/her own. The establishment of the logical data structures requires a detailed knowledge of the business processes and may already consider implementation details. Therefore, the

Logical Data Structures - Responsibilities

Logical views of tables and data flows between tables for processes

Must show application programmers:

Which tables contain data for processesHow to navigate from one table to the next

Allows verification of tables for application domain

Joint effort between

Must write programs for processes(Should) know processes

and application programmers

Knows tablesKnows referential constraints

Database designer

Input for application programmers



Student Notebook

application programmers writing the programs or queries for the business processes must participate in the development of the logical data structures. They should have a primary interest in the logical data structures. They should also have the required knowledge of the business processes. How else can they implement them?!

Instead of the application programmers, the application domain expert could participate in the development of the logical data structures. However, since implementation considerations may affect the logical data structures for a business process, the participation of the application programmers is preferable. Because the logical data structures are input for the application programmers, they should be the driving force in establishing them.

You may ask what the database designer's interest is in the development of the logical data structures? He/She has a very good reason for participating in their development. By establishing the logical data structures, the correctness and completeness of his/her database design is verified. In addition, some performance bottlenecks may be revealed leading to additional denormalizations, the combining of tables, and the splitting of tables.

The detection of design problems requires a reiteration of the design process rather than patches to the tables. By just patching the tables, the quality of the design is jeopardized and the rationale for design decisions is easily abandoned. If the changes are minor, it does not take much time to verify and correct the intermediate design steps and, thus, validate the basic design concept. If the changes are major, you better follow the design steps from top to bottom when rectifying the problem.




Uempty

Figure 10-5. Sample Business Process CF182.0

Notes:

The business process on this visual is a business process for our sample airline company called Come Aboard. For a given maintenance number, the business process displays information about the maintenance record, the aircraft for maintenance record, and the subrecords for the maintenance record.

For the maintenance record itself, it displays the date when the maintenance was performed and the type of maintenance performed. In addition, it displays the employee number and the name (last name, first name, and middle initial) of the employee that performed the maintenance. Furthermore, the aircraft number of the aircraft is displayed for which the maintenance was performed.

If Come Aboard still contains data about the aircraft, the date when the aircraft was manufactured and the date when the aircraft was put into service are displayed. In addition, the model number and type code for the aircraft and the name of the manufacturer are displayed.

A maintenance record may have subrecords which again may have subrecords and so on. For each subrecord, the date and type of maintenance are displayed.

Sample Business Process

For a specified maintenance number, display the following information:

1. The date when the maintenance was performed and the type of maintenance performed.

2. The employee number and the name of the mechanic who performed the maintenance.

3. The aircraft number of the aircraft for which the maintenance was performed.

4. If the aircraft is still owned by CAB, the date when the aircraft was manufactured, the date when the aircraft was put into service, the model number and type code for the aircraft, and the name of the manufacturer.

5. For each subrecord (direct or indirect) for the maintenance record, the date of the maintenance and the type of maintenance performed.

Display Maintenance Record Summary



Student Notebook

We will come back to the various points on the visual when discussing the logical data structure for the business process.




Uempty

Figure 10-6. Sample Structure Diagram CF182.0

Notes:

A logical data structure consists of three components:

• A Structure Diagram illustrating how the various tables for the logical data structure are interconnected. Since the structure diagram is the component resembling most what you would expect from a structure, the term logical data structure is frequently used synonymously for it.

• A Path Summary describing the columns through which the tables of the structure diagram are interconnected.

• A Table Summary listing the columns needed for the various tables of the structure diagram.

The current visual illustrates the structure diagram for the logical data structure for our sample business process. Basically, the structure diagram looks as follows:

• The rectangular boxes in the structure diagram represent the tables used by the business process (or a part of it) associated with the logical data structure. The boxes contain the names of the tables.

Sample Structure Diagram

MAINTENANCE_RECORD/1

AIRCRAFT

AIRCRAFT_TYPE

MANUFAC-TURER

MAINTENANCE_RECORD/2EMPLOYEE

5

4

2 63

INPUT

1

C

7



Student Notebook

If a business process uses the same table multiple times, for the same purpose or for different purposes, the table occurs multiple times in the structure diagram. A different usage may require different columns of the table. To tell the different uses apart and correctly assign the columns to their uses in the table summary, the names of tables occurring multiple times are appended by "/n". n uniquely numbers the different uses. In the example, table MAINTENANCE_RECORD is used for two purposes as will be described later.

• An arrow interconnecting two tables illustrates a data flow in the direction of the arrow. The table at the beginning of the arrow is referred to as source table for the flow, the table at the end as target table. A value found in the source table is used unmodified to access the corresponding rows in the target table. For example, the employee number found in a maintenance record is used to access the row for the appropriate employee in table EMPLOYEE ( 2 ). This corresponds to a Join operation for the tables.

The tables can be joined through a single column or multiple columns. The columns in the target table can be named differently, but their function must be the same.

• The arrows are labeled to establish a reference to the path summary for the logical data structure. For each interconnection of two tables (path), the path summary lists the columns of the source table as well as the columns of the target table. If the interconnection is through multiple columns, the column names are preceded by sequence numbers establishing the correspondence between the respective source and target columns.

• As for referential structures, single-headed and double-headed arrows are used to indicate how many rows may be found in the target table for a value. A single-headed arrow means that at most one row with the source value may be found in the target table. A double-headed arrow means that multiple rows with the source value may be found in the target table.

• If a path corresponds to a referential constraint (a primary-key/foreign-key relationship) in the direction of the arrow, the delete rule is specified at the target end. The referential constraint may allow the application programmer to skip steps of the business process because they are automatically done by the referential integrity support of the system.

• It may happen that a table is accessed recursively (for the same purpose). In this case, the arrow for the path leads back to the same table as is the case for table MAINTENANCE_RECORD/2.

It is conceivable that the recursive loop comprises multiple tables.

• The data flow for a business process (or a part of a business process) always starts with a specific table referred to as entry table. The entry table is identified by an oval labeled INPUT pointing to it. In case of the sample logical data structure, the business process starts with table MAINTENANCE_RECORD/1.

The interconnection between the INPUT box and the entry table is also labeled and described in the path summary, the entry table being the target table.




Uempty
Most of the times, a subset of the rows of the entry table is selected based on the values of certain columns. The columns used for the selection are specified as target columns in the path summary. Since not applicable, the fields Source Table and Source Columns remain blank.
So far the general description of the structure diagram. Now, let us explain how we arrived at the logical data structure for the sample business process:

1. The business process displays information about the maintenance record whose maintenance number is specified as input. Table MAINTENANCE_RECORD is the entry table for the logical data structure and the oval labeled INPUT points to it. The connecting arrow ( 1 ) is a single-headed arrow because table MAINTENANCE_RECORD can only contain a single row with the specified maintenance number.

The first step of the business process requests that the date of the maintenance record and the type of maintenance performed be displayed.

2. As a consequence of the first step of the business process, the path summary for the logical data structure contains a row for path 1 . The row identifies column Maintenance_Number as target column for table MAINTENANCE_RECORD.

3. The first step of the business process needs the following columns of table MAINTENANCE_RECORD: Maintenance_Number, Date_Maintenance, and Type_Maintenance. Therefore, they are included in the table summary for table MAINTENANCE_RECORD.

4. The second step of the business process requests that employee number and name of the mechanic be displayed who performed the maintenance.

Table MAINTENANCE_RECORD contains the employee number of the mechanic who performed the maintenance. The employee number is used to retrieve the row for the mechanic in table EMPLOYEE. The row contains the name of the mechanic. Accordingly, we have a path ( 2 ) from table MAINTENANCE_RECORD to table EMPLOYEE. The connecting arrow must be a single-headed arrow because table EMPLOYEE contains a single row for the employee number.

Note we do not need to access table MECHANIC since we do not need any data of that table. Consequently, path 2 does not correspond to a relationship type of the entity-relationship model or a referential constraint of the referential structure.

5. The path summary must include a row for path 2 . The row describes that tables MAINTENANCE_RECORD and EMPLOYEE are joined via column Employee_Number. The value found for column Employee_Number in table MAINTENANCE_RECORD is used as search argument for column Employee_Number of table EMPLOYEE.

As the consequence of the second step of the business process, column Employee_Number is added to table MAINTENANCE_RECORD in the table summary.



Student Notebook

In addition, the table summary states that the following columns of table EMPLOYEE are needed by the business process: Employee_Number, Last_Name, First_Name, and Middle_Initial.

6. The third step of the business process requests that the aircraft number for the aircraft be displayed for which the maintenance was performed. Since the aircraft number is contained in the maintenance record, the structure diagram need not change.

7. Because of Step 3 of the business process, column Aircraft_Number must be added to the columns needed from table MAINTENANCE_RECORD. The path summary remains unchanged.

8. The fourth step of the business process requests that the date when the aircraft was manufactured, the date when the aircraft was put into service, and the model number and type code for the aircraft be displayed.

If Come Aboard still has information about the aircraft, the requested information is contained in the row for the aircraft in table AIRCRAFT. To retrieve the row, the aircraft number in the maintenance record is used (path 3 ). The arrow must be a single-headed arrow because at most one row can be found in table AIRCRAFT for a given aircraft number.

Note that there is not a relationship type interconnecting entity types AIRCRAFT and MAINTENANCE RECORD in the entity-relationship model. Remember that the maintenance records for an aircraft must be kept even if the remaining information about the aircraft is deleted. For that reason, there is also not a referential constraint for the tables.

9. Because of Step 4 of the business process, the path summary must contain a row for path 3 . The row shows that column Aircraft_Number of table MAINTENANCE_RECORD is used as search argument for column Aircraft_Number of table AIRCRAFT.

The table summary comprises a row for table AIRCRAFT listing all columns requested by Step 4 of the business process.

10.The fourth step of the business process also requests that the name of the manufacturer of the aircraft be displayed. To find the manufacturer name, we must use the type code for the aircraft found in table AIRCRAFT and retrieve the row for the aircraft type from table AIRCRAFT_TYPE. (We need not go to table AIRCRAFT_MODEL since we do not need model-specific information.)

The row retrieved contains the manufacturer code which is then used to retrieve the row for the manufacturer from table MANUFACTURER. The retrieved row contains the name of the manufacturer.

The structure diagram is extended by the two interconnections ( 4 and 5 ) required to accomplish the requested task.




Uempty
11.As a consequence of the retrieval of the manufacturer name, the path summary contains two additional rows describing the transitions from table AIRCRAFT to table AIRCRAFT_TYPE and from table AIRCRAFT_TYPE to table MANUFACTURER.
The table summary reflects that columns Type_Code and Manufacturer_Code of table AIRCRAFT_TYPE and columns Manufacturer_Code and Company_Name of table MANUFACTURER are needed.

12.The fifth step of the business process requests that, for all subrecords of the specified maintenance record, the date and type of the maintenance be displayed.

To obtain the subrecords for the maintenance record, we must retrieve all rows of table MAINTENANCE_RECORD for which the value of column Owning_Record is equal to the maintenance number of the specified maintenance record. This is expressed by path 6 whose source and target is table MAINTENANCE_RECORD.

The structure diagram shows table MAINTENANCE_RECORD twice, and not an arrow returning to the same table, because we have two different uses of table MAINTENANCE_RECORD: Once, it is used for the original maintenance record and once for the subrecords. That the uses are different is underlined by the fact that the columns needed for the subrecords are different (fewer) and that there are different interconnections from the subrecords.

The arrow for path 6 must be double-headed because multiple subrecords may exist for a maintenance record.

13.To obtain unique references for table MAINTENANCE_RECORD in the path summary and the table summary, "/1" and "/2" are appended to the table name, respectively.

14.The path summary contains a row for path 6 describing that columns Maintenance_Number of table MAINTENANCE_RECORD/1 and Owning_Record of table MAINTENANCE_RECORD/2 are joined.

The table summary describes that columns Owning_Record, Date_Maintenance, Type_Maintenance, and Maintenance_Number of table MAINTENANCE_RECORD/2 are needed by the business process. Even though not expressed explicitly by the description of the business process, the maintenance numbers for the subrecords must be displayed to identify the subrecords. Column Maintenance_Number is also needed for a different reason as we will see in a moment.

15.Looking more closely at the description of Step 5 reveals that not only the immediate subrecords of the maintenance record are needed, but also all indirect subrecords. This means that also the subrecords of the subrecords, and again their subrecords, are needed.

Thus, we need the recursion represented by path 7 : The maintenance number of a subrecord is used to locate all maintenance records whose column Owning_Record contains that maintenance number. Since the interconnection corresponds to the self-referencing constraint for table MAINTENANCE_RECORD, the delete rule (CASCADE) is specified at the target end of the arrow.



Student Notebook

16.The path summary must contain a row for path 7 describing the recursion. The table summary remains unchanged since additional columns are not needed for table MAINTENANCE_RECORD/2.




Uempty

Figure 10-7. Sample Path and Table Summaries CF182.0

Notes:

The visual illustrates the path summary and the table summary for the sample business process described on page 10-22. For each path of the structure diagram, the path summary lists the source table and the target table. It also specifies the source-table and target-table columns that are joined.

For each usage of a table, the table summary specifies the columns needed.

The notes for the previous visual describe how the path summary and the table summary for the sample business process are derived.

Sample Path and Table Summaries

# Source Table Source Columns Target Table Target Columns

1 MAINTENANCE_RECORD/1 Maintenance_Number

2 MAINTENANCE_RECORD/1 Employee_Number EMPLOYEE Employee_Number

3 MAINTENANCE_RECORD/1 Aircraft_Number AIRCRAFT Aircraft_Number

4 AIRCRAFT Type_Code AIRCRAFT_TYPE Type_Code

5 AIRCRAFT_TYPE Manufacturer_Code MANUFACTURER Manufacturer_Code

6 MAINTENANCE_RECORD/1 Maintenance_Number MAINTENANCE_RECORD/2 Owning_Record

7 MAINTENANCE_RECORD/2 Maintenance_Number MAINTENANCE_RECORD/2 Owning_Record

Path Summary

Table Columns

MAINTENANCE_RECORD/1 Maintenance_Number, Date_Maintenance, Type_Maintenance, Employee_Number, Aircraft_Number

EMPLOYEE Employee_Number, Last_Name, First_Name, Middle_Initial

AIRCRAFT Aircraft_Number, Date_Manufactured, Date_in_Service, Type_Code, Model_Number

AIRCRAFT_TYPE Type_Code, Manufacturer_Code

MANUFACTURER Manufacturer_Code, Company_Name

MAINTENANCE_RECORD/2 Owning_Record, Date_Maintenance, Type_Maintenance, Maintenance_Number

Table Summary



Student Notebook

Figure 10-8. An Alternate Representation CF182.0

Notes:

You might already have wondered why the path summary is needed? Indeed, you can show the joined columns immediately in the structure diagram as done on the above visual for the sample business process used so far. The arrows then point from the source column to the target column and labels are no longer needed for the arrows. The names for the tables are outside the boxes, next to them. The table summary is still necessary since it is impractical to incorporate all needed columns into the diagram.

The resulting diagram seems to be simpler and clearer. However, this representation does not always work well. It works well for those cases where a single column is used to navigate from a table to table. The representation becomes complex and blurred if you must join the tables on multiple columns and the columns are named differently in the two tables. It becomes especially confusing if you must join a table with multiple other tables on multiple columns and the columns overlap.

Furthermore, the above representation requires more space and you might find it more difficult to squeeze it onto a single page.

An Alternate Representation

Still need table summary, but not path summary

C

Aircraft_Number

Employee_Number

Maintenance_Number


Aircraft_NumberType_Code

AIRCRAFT

AIRCRAFT_TYPE Manufacturer_Code

Type_Code


Maintenance_Number

Owning_Record

Employee_Number

EMPLOYEE

MANUFACTURER Manufacturer_Code

Input




Uempty
In contrast, if you are using a path summary, you can even omit the structure diagram since path summary and table summary together contain all necessary information. The structure diagram just provides a graphical view of the flow between the tables.
Now, you have a choice. Make the best of it.



Student Notebook

Figure 10-9. Processes and Logical Data Structures CF182.0

Notes:

The example we studied on the previous pages only required a single logical data structure. As we have already discussed, a logical data structure describes a continuous flow through a subset of the tables of the application domain. As for Join operations, the data found in a row is used unmodified to select the rows of the next table. This entails that many business processes will require multiple logical data structures since they do not just use the values found to select the rows of the next table. Rather, they use additional criteria (search arguments) or derived search arguments. Different or additional search arguments require a separate logical data structure.

Many logical data structures are simple because they access a single table. In particular, this applies to the logical data structures involving insert, update, or delete operation because the corresponding SQL statements only allow the specification of a single table. The structure diagrams consist of an input box, the box for the table, and an arrow connecting them. The table summary lists the accessed columns of the table. The path summary shows through which columns the table is entered, i.e., the search argument for the rows retrieved, updated, or deleted or the columns inserted. You may opt to omit the

Processes and Logical Data Structures

Many interconnections of the logical data structures represent primary-key/foreign-key interrelationships, but not all

Most of the time, the input for a logical data structure is part of the input for the business process, but not always

Data found in one table used unmodified to select the rows of the next table

A logical data structure describes a continuous flow through a subset of the tables of the application domain

A business process may have many logical data structures

Columns needed for the tables should coincide with the data read or written by the business process as described in the process inventory

Many logical data structures are trivial

Structure diagram consists of input box, box for table, and connecting arrowTable summary lists columns needed/modified in table

Just access a single table

Path summary identifies search argument for table




Uempty
structure diagram for these logical data structures since it does not provide much information.
Many paths in logical data structures represent primary-key/foreign-key interrelationships, but not all, as we have seen for the previous example.

Most of the time, the input for a logical data structure is part of the input for the business process, but not always. Secondary logical data structures may use a derived input.

The columns needed for the various tables should coincide with the data read or written by the business process as described in the data inventory (see Unit 5 - Data and Process Inventories).



Student Notebook

Figure 10-10. Example 2 - Business Process CF182.0

Notes:

The visual displays the textual description of business process Assign Captain for Flight which we already discussed in Unit 5 - Data and Process Inventories. This business process will require multiple, fairly simple, logical data structures.

Example 2 - Business Process


1. It is verified that the specified flight and pilot exist. If flight or pilot do not exist, an appropriate error message is displayed and the business process ends.

2. If pilot and flight exist, it is checked if the pilot has the license to fly the aircraft model for the leg for the flight. If the pilot cannot fly the aircraft model, an appropriate error message is displayed and the business process ends.

3. If the pilot has the license to fly the aircraft model, it is checked if the pilot has already been assigned to the flight. If the pilot is already captain or copilot for the flight, an appropriate message is displayed and the business process ends.

4. If the pilot has not yet been assigned to the flight, it is checked if another pilot is already captain for the flight. If so, a message is displayed containing employee number, last name, and first name of the current captain and the business process ends.

5. If a captain has not yet been assigned to the flight, the specified pilot becomes the captain for the flight.

6. A message is displayed confirming that the pilot has been assigned as captain to the flight. The message includes employee number, last name, and first name of the assigned captain.




Uempty

Figure 10-11. Example 2 - Structure Diagrams CF182.0

Notes:

The sample business process described on the previous visual requires multiple logical data structures as explained in the following:

1. The first two steps of the business process verify that the specified flight and pilot exist and the pilot has the license to fly the aircraft model for (the leg for) the flight.

As a matter of fact, we need not explicitly verify that the specified employee is a pilot. It is sufficient to verify that he/she belongs to the persons having the license to fly the aircraft model for the flight. If we do not find him/her in the list of the persons, the business process ends anyway. If he/she is in the list, we know that the specified employee is a pilot. The referential constraints for table PILOT_FOR_AM enforce this. Table PILOT_FOR_AM which contains a row for every valid pilot/aircraft model combination is constrained by table PILOT.

The point discussed represents an implementation detail. It confirms that the application programmers should participate in the establishment of the logical data structures.

Example 2 - Structure Diagrams

FLIGHT

2

INPUT

1

LEG

Structure 1

EMPLOYEE

1

INPUT

Structure 4

PILOT_ASSIGNMENT

1

INPUT

Structure 3

PILOT_FOR_AM

1

INPUT

Structure 2

PILOT_ASSIGNMENT

1

INPUT

Structure 5



Student Notebook

To verify that the flight exists, we must access table FLIGHT using the specified flight number, airport of departure, airport of arrival, and flight locator. Using the values found in columns Flight_Number, From, and To, we must navigate to table LEG to determine the aircraft model for the flight (columns Type_Code and Model_Number).

To verify that the specified pilot can fly the aircraft model, we have two choices:

•We can access table PILOT_FOR_AM just using the type code and the model number for the aircraft model. In this case, we need to retrieve all pilots that can fly the aircraft model until we have found the specified pilot or know that he/she is not in the list.

•We can access table PILOT_FOR_AM using the type code and the model number for the aircraft model and the employee number of the specified pilot. In this case, we will retrieve at most one row. If a row is returned, the specified employee is a pilot and can fly the aircraft model. If a row is not returned, the specified employee is not a pilot or cannot fly the aircraft model. In either case, he must not be considered for the flight.

If we took the first choice, we could continue the structure diagram to table PILOT_FOR_AM since the value found in table LEG is used unmodified to navigate to table PILOT_for_AM.

For the second choice, the search arguments for table PILOT_FOR_AM are the type code and model number found in table LEG and the specified employee number. Thus, a second logical data structure is required.

The first choice is a poor performer and we will choose the second alternative assuming that an index is provided for the primary key of table PILOT_FOR_AM.

Since choosing the second alternative, the structure diagram for Structure 1 ends with table LEG. Path summary and table summary for the logical data structure are illustrated on page 10-26.

2. As explained before, we will use the type code and model number for the aircraft model and the employee number for the pilot to access table PILOT_FOR_AM. Therefore, we need a second logical data structure (Structure 2). Its structure diagram is extremely simple since only one table is accessed. It consists of an input box, table PILOT_FOR_AM, and the arrow interconnecting them. The structure diagram does not continue further because we must use different inputs for the subsequent steps of the business process.

Path summary and table summary for the logical data structure are on page 10-26.

3. Steps 3 and 4 of the business process check if the pilot has already been assigned to the flight or if another pilot is already captain for the flight.

Both questions can be answered by a single access to table PILOT_ASSIGNMENT. For this access, only the flight information (flight number, airport of departure, airport of arrival, and flight locator) is used and not the employee number of the pilot. At most, two rows are returned: one for the captain of the flight and one for the copilot. The returned rows are then examined.




Uempty
The appropriate logical data structure is Structure 3. Its path summary and table summary are on page 10-26.
Note that columns Employee_Number and Pilot_Function must be retrieved to make the necessary decisions.

4. If another pilot has already been assigned as captain to the flight, the fourth step of the business process requests that employee number, last name, and first name of that pilot be displayed. For this, we need a further logical data structure (Structure 4). Its path summary and table summary are on page 10-26.

You might ask why Structure 3 is not continued to table EMPLOYEE? Continuing the structure to table EMPLOYEE would mean that table EMPLOYEE were accessed for every row retrieved from table PILOT_ASSIGNMENT. However, this is not the case for an existing copilot assignment and we do not want to make unnecessary accesses for the no-error cases.

5. The fifth step of the business process assigns the specified pilot as captain to the flight. This gives rise to logical data structure Structure 5. Its path summary and table summary are on page 10-26.

At the first glance, the logical data structure seems to be the same as Structure 3. However, Structure 3 was for retrieval whereas Structure 5 is for the insertion of rows and its path summary is different. As target columns, it shows all columns of table PILOT_ASSIGNMENT meaning that they are input for the insert request.

6. The final step of the business process (Step 6) requests that employee number, last name, and first name of the newly assigned captain be displayed. This requires an access to table EMPLOYEE. For this access, we do not need an additional logical data structure. Structure 4 can be used with the employee number of the new captain for the flight.



Student Notebook

Figure 10-12. Example 2 - Path and Table Summaries (1 of 3) CF182.0

Notes:

This visual illustrates the path and table summaries for logical data structures Structure 1 and Structure 2 for the second sample business process.

Example 2 - Path and Table Summaries (1 of 3)


1 PILOT_FOR_AM 1: Type_Code,2: Model_Number,3: Employee_Number

Structure 2 - Path Summary

Structure 2 - Table Summary

Table Columns

PILOT_FOR_AM Type_Code, Model_Number, Employee_Number


1 FLIGHT 1: Flight_Number, 2: From,3: To, 4: Flight_Locator

2 FLIGHT 1: Flight_Number, 2: From,3: To

LEG 1: Flight_Number, 2: From,3: To



Table Columns

FLIGHT Flight_Number, From, To, Flight_Locator

LEG Flight_Number, From, To, Type_Code, Model_Number




Uempty


Notes:

This visual illustrates the path and table summaries for logical data structures Structure 3 and Structure 4 for the second sample business process.



1 EMPLOYEE Employee_Number



Table Columns

EMPLOYEE Employee_Number, Last_Name, First_Name


1 PILOT_ASSIGNMENT 1: Flight_Number, 2: From, 3: To, 4: Flight_Locator



Table Columns

PILOT_ASSIGNMENT Flight_Number, From. To, Flight_Locator, Employee_Number, Pilot_Function



Student Notebook


Notes:

This visual illustrates path summary and table summary for logical data structure Structure 5 for the second sample business process.



1 PILOT_ASSIGNMENT 1: Flight_Number, 2: From, 3: To, 4: Flight_Locator, 5: Employee_Number,6: Pilot_Function



Table Columns

PILOT_ASSIGNMENT Flight_Number, From. To, Flight_Locator, Employee_Number, Pilot_Function




Uempty

Figure 10-15. Characteristics of Views CF182.0

Notes:

Views are database objects representing subsets of columns and rows of one or more tables. Thus, by means of views, you can represent the views logical data structures have of the data in the tables of the application domain.

For example, you could define a view joining tables MAINTENANCE_RECORD, AIRCRAFT, and EMPLOYEE and selecting a subset of their rows and columns:

• The rows of tables MAINTENANCE_RECORD and AIRCRAFT having the same aircraft number should be combined.

• The resulting rows should be joined with the rows of table EMPLOYEE having the same employee number.

• The view should contain the following columns of the three tables:

From table MAINTENANCE_RECORD:

Maintenance_Number, Date_Maintenance, Type_Maintenance, Aircraft_Number, and Employee_Number

Characteristics of Views

Represent subsets of the data in the tables of the application domain

When defining a view, a description of the data represented by the view is stored

May comprise rows and columns of multiple tables

Selected columns can be ordered in any way desired

Selected columns can be renamed

When the data described by the view is displayed, it is presented in form of a table

Data is always up-to-date

All data comes from the base tables and not from a table corresponding to the view

A view receives a name which can be used in SQL statements where table names can be used

During execution, SQL statement replaced by SQL statement only containing actual column and table names which is then executed Views comprising multiple tables cannot be used in INSERT, UPDATE, or DELETE statements



Student Notebook

From table AIRCRAFT:

Date_Manufactured, Date_in_Service, Type_Code, and Model_Number

From table EMPLOYEE:

Last_Name, First_Name, and Middle_Initial

Column Aircraft_Number of table AIRCRAFT is not needed since tables MAINTENANCE_RECORD and AIRCRAFT are joined on equal aircraft numbers. Similarly, column Employee_Number of table EMPLOYEE is not needed because tables MAINTENANCE_RECORD and EMPLOYEE are joined on equal employee numbers.

• You can add criteria for selecting specific rows. For example, you could request that only the rows for a specific maintenance number be part of the view.

The view described above represents a subset of the logical data structure for the first sample business process discussed in conjunction with logical data structures.

Basically, a view can select the rows and columns you can select by means of (a subset of) the SELECT SQL statement. The columns and rows for a view can come from a single table or from multiple tables. In the view definition, you can order the desired columns in any way you want. The order of the columns in the view definition determines the order in which the columns are made available when using a "SELECT *" to display the data for the view. You can also rename the columns in the view definition.

When defining a view to the system, you specify the appropriate SELECT statement. The SELECT statement is not executed. Rather, it is stored as description of the data belonging to the view.

As all database objects, views receive names to allow them to be referenced. Their names can be used in SQL statements where table names can be used. To access the data of a view, you must specify the name of the view in the appropriate SQL statement. For example, if you want to retrieve the data represented by a view, you must use the name of the view (in place of a table name) in a SELECT statement.

When an SQL statement containing a view name is executed, it is replaced, by means of the view definition, by a different SQL statement only containing actual column and table names. The derived SQL statement is executed in place of the original SQL statement. The replacement concept is the reason why views comprising multiple tables cannot be used in INSERT, DELETE, or UPDATE SQL statements: The resulting SQL statement would not be valid. For the same reason, there are also other restrictions for views.

When displaying the data for a view by naming the view in a SELECT statement, the data is presented in form of a table. Since the derived SQL statement only contains references to the base tables, i.e., the real tables used by the view, all data comes directly from the base tables. As a consequence, the data is always up-to-date.

Because the displayed data is presented in form of a table, views are also referred to as virtual tables. They are not real tables.




Uempty

Figure 10-16. Usage of Views CF182.0

Notes:

Views are an important tool for achieving data security for your tables since they limit the data end users or programs can see or change. By not allowing direct access to the actual tables, you can limit the access of people to the data of the views you authorized them for.

Another positive aspect of views is that end users and business processes only see the data they are interested in. Thus, the data presented to end users are more readily understandable and the programs for the business processes need not provide variables for data they do not need. Consequently, views ease the work of end users and application programmers.

Explicitly naming the columns in the view definition makes your business processes more resilient again database changes. If the sequence of the columns in the database changes due to the redefinition of a table, end users and programs using the view will not realize the changes and are not impacted. If the actual names of columns change, you can change the view definition in such a way that end users and programs using the view do not realize the name changes.

Usage of Views

Views allow you to limit the data an end user can see or change

Data security

Complementary to logical data structures

One or more views for each logical data structureA view may serve multiple data structures

End users/Business processes only see data they are interested in

Ease of use

Explicitly name columns in views

Resilience against changes in base tables

Do not allow end users or business processes to access base tables

Freedom to change and extend base tables



Student Notebook

By explicitly naming the columns in the view, you also ensure that end users or programs do not realize the addition of new columns they are not interested in.

From a design perspective, you should not allow end users or program to directly access the base tables, i.e., the actual tables. As a consequence, you have more freedom to change and extend the tables as long as you ensure that the external appearance of the views remains unchanged. Furthermore, if all columns are explicitly named in the view definition, end users and programs selecting all columns via "SELECT *" are not impacted if new columns are added to the table that are not contained in the view definition.

As we have illustrated by means of the example in the notes for the previous visual, views complement the logical data structures. For a logical data structure, you may have multiple views. Conversely, a single view may serve multiple logical data structures.




Uempty
Checkpoint

1. Name the two major inputs for the development of the logical data structures.

_____________________________________________________

_____________________________________________________

_____________________________________________________

2. What are the two main purposes of logical data structures?

_____________________________________________________

_____________________________________________________

_____________________________________________________

3. A logical data structure reflects the data flow between the tables of the application domain for a business process or a part of it. (T/F)

4. Since the logical data structures are intended for the application programmers, the database designer is not involved in their development. (T/F)

5. Which of the following choices are correct? If problems are detected during the development of the logical data structures, the database designer should ...

a. Patch the tables and not worry about the earlier design steps.

b. Verify all steps of the design process.

c. Restart the design process with the establishment of the tables.

d. Restart the design process with the establishment of the tuple types.

6. Name the components of a logical data structure.

_____________________________________________________

_____________________________________________________



Student Notebook

7. What is the purpose of the structure diagram?

_____________________________________________________

_____________________________________________________

_____________________________________________________

8. What does the path summary specify?

_____________________________________________________

_____________________________________________________

_____________________________________________________

9. What does the table summary specify?

_____________________________________________________

_____________________________________________________

_____________________________________________________

10.All interconnections in a structure diagram are primary-key/foreign-key interrelationships. (T/F)

11. Views are only descriptions of data. They are not real tables. (T/F)

12.Name four advantages of views.

_____________________________________________________

_____________________________________________________

_____________________________________________________




Uempty


Notes:

Unit Summary

Views provide data security, ease of use, resilience against database changes, and freedom to change tables

Views allow to describe subsets of the tables of the application domain

The same view may be used by multiple logical data structures

For a logical data structure, you can have one or more views

The logical data structures for a business process identify:

The tables and columns for the data elements of the business process

How the tables for the business process are interconnected

The table summary describes which columns are needed for the different uses of the tables

The components of logical data structures are: structure diagram, path summary, and table summary

The structure diagram is a graphical representation of the interconnections between the tables

The path summary describes the interconnections between the tables

From table to table via columns



Student Notebook




AP
Appendix A. Sample Problem Statement
Overview

Come Aboard (CAB) is an airline servicing a set of airports with its aircraft. As employees, it has pilots flying the aircraft, mechanics maintaining and servicing the aircraft, and other personnel for various service functions.

CAB wants to administer flight planning, pilot assignment, and aircraft maintenance activities by means of a database management system.

Business Object Types

CAB wants to store information about the following business object types in its database:

Aircraft Models

For its flying activities, CAB uses aircraft of different types or, more precisely, models such as Boeing 737, Model 500, or Airbus A320, Model 200. For the aircraft models it owns or has on order, CAB wants to maintain information in its database such as their category (e.g., JET or TURBOPROP), length, height, wing span, or number of engines.

The aircraft models can be uniquely identified by their type code (e.g., B737) together with their model number (e.g., 500).

Aircraft

CAB owns multiple aircraft of the various aircraft models. For the aircraft it owns, CAB wants to maintain information such as the date when the aircraft was acquired, the engines mounted on the aircraft, or the seats of the aircraft.

Each aircraft has a unique serial number. This serial number is unique across aircraft models.

Airports

CAB services a set of airports with its aircraft. For these airports, as well as for airports CAB plans to service in the near future, CAB wants to keep information in its database such as the airport code, the location of the airport, the address of CAB's city ticketing office, or the address of CAB's airport office.

The airport codes uniquely identify the various airports.


© Copyright IBM Corp. 2000, 2002 Appendix A. Samp le Problem Statement A-1

Student Notebook

Pilots

CAB wants to store information (e.g., name, address, phone number, or date of previous medical check-up) for its pilots. As every employee, pilots have a unique employee serial number.

Mechanics

CAB wants to store information (e.g., name, address, phone number, or area of expertise) for its mechanics. As every employee, mechanics have a unique employee serial number.

Itineraries

Itineraries are ordered collections of consecutive nonstop connections between airports which are called legs. This means that the ending airport for the previous leg is always the starting airport for the next leg.

Itineraries have unique flight numbers (e.g., YY1842). All legs of an itinerary are operated under the flight number of the itinerary. CAB wants to maintain information about the itineraries such as the seating classes offered, the weekdays on which the itinerary is operated (starting days), and the planned departure and arrival times for the legs.

Flights

A flight is a scheduled or executed nonstop trip between two airports. Flights are always related to the legs of itineraries. The information kept about flights includes, for example, the estimated departure and arrival times (which might be different from the planned departure and arrival times for the appropriate leg because of delays) and the actual departure and arrival times.

The individual flights can be identified by means of a sequence number, referred to as flight locator, which is unique per itinerary and leg. Thus, to identify a particular flight, you need to know the flight number for the itinerary (e.g., YY1842), the airports for the legs (e.g., FRA - JFK), and the flight locator (e.g., 453) for the flight.

Maintenance Records

As the aircraft are maintained, maintenance records are established for them. The information gathered as part of the maintenance records includes, for example, the type of the maintenance performed and the date of the maintenance.

Each maintenance record has a unique sequence number referred to as maintenance number.


A-2 Relational DB Design © Copyright IBM Corp. 2000, 2002


AP
The following types of relationships exist between the business object types which CAB wants to implement in its database:

Aircraft Model s - Aircraft

For an aircraft model, CAB may have any number of aircraft. In particular, it is possible that there are no aircraft (yet) for an aircraft model. Conversely, an aircraft belongs to one and only one aircraft model.

Aircraft Model s - Airports

Before an aircraft can be used for flights to and from an airport, CAB must acquire start and landing rights for the appropriate aircraft model for this airport. An aircraft model servicing multiple airports must have start and landing rights for all these airports. For an airport serviced by different aircraft models, start and landing rights must be obtained for all aircraft models servicing the airport.

It is possible that CAB does not have any start and landing rights for an aircraft model. For example, this may happen if the airports serviced by this aircraft model are no longer serviced by CAB and, thus, dropped.

It is also possible that CAB does not have any start and landing rights for an airport in its database.

Pilots - Aircraft Models

CAB wants to record which pilots can fly the various aircraft models. Pilots may be able to fly multiple aircraft models. Conversely, an aircraft model may be flown by different pilots.

It is possible that, temporarily, a pilot cannot fly any of the aircraft models. It is also possible for an aircraft model that CAB does not have a pilot that can fly the aircraft of this model. For example, this may be the case for a newly ordered aircraft model for which CAB has not yet hired a pilot.

Airports - Itineraries

An itinerary consists of one or more legs. The legs are nonstop connections between two airports, the starting and the ending airports for the leg. Airports can be the starting or ending points for legs of multiple itineraries.

If an airport is no longer needed by CAB and is deleted, all itineraries should be deleted as well for which the airport had been a stopover.



Student Notebook

Itineraries - Flights

For each leg of an itinerary, there may be multiple flights. These can be scheduled flights or completed flights. Completed flights are kept for a certain period of time.

A flight always applies to one leg of one itinerary.

Aircraft Models - Legs

Aircraft models are assigned to the legs of an itinerary to define the kind of aircraft for the flights for the legs. At all times, a leg must have one, and only one, aircraft model assigned to it. The assignment is made when the leg is established, but may be changed.

An aircraft model may be assigned to multiple legs. It need not be assigned to any legs.

Aircraft - Flights

Aircraft are assigned to flights. Flights represent nonstop connections. Therefore, only one aircraft is assigned to a flight. An aircraft can be assigned to multiple flights. The aircraft assignment is not necessarily made at the point in time when the flight is scheduled.

It is possible that, at a given point in time, an aircraft has not been assigned to any flight.

Pilots - Flights

To each flight, one pilot is assigned as (flight) captain and another pilot as copilot. This assignment is not necessarily made at the point of time when the flight is scheduled, but at least three weeks before the flight is performed.

A pilot can function as captain or copilot for multiple flights. It is possible that, at a given point in time, a pilot does not have any flight assignments.

Mechanics - Aircraft Models

Mechanics are trained to repair the aircraft of a specific aircraft model. A mechanic can be trained for multiple aircraft models. For an aircraft model, multiple mechanics may have the required training.

It is possible that, temporarily, a mechanic does not have the training for any of the aircraft models. Conversely, it is possible that, for an aircraft model, CAB does not have a trained mechanic.

Mechanics - Aircraft

CAB wants to record which mechanics are scheduled for the next maintenance service of an aircraft. A mechanic may perform the




AP
maintenance service for multiple aircraft. Conversely, multiple mechanics may be assigned to a single aircraft.
It is possible that the next maintenance service has not yet been scheduled for an aircraft. At a given moment, it is also possible, that a mechanic has not been assigned to any aircraft.

Mechanics - Main tenance Records

For every maintenance performed, a maintenance record is established by a mechanic. For each maintenance record, one and only one mechanic is responsible.

If a mechanics leaves the company, his/her maintenance records are assigned to another mechanic.

Aircraft - Maintenance Records

As an aircraft is serviced, a maintenance record for the aircraft is established. A maintenance record applies to one and only one aircraft. For an aircraft, there may be multiple maintenance records.

The maintenance records for an aircraft contain the serial number for the aircraft. All maintenance records for an aircraft must be kept for the time the aircraft is owned by CAB and for two years thereafter. This implies that the maintenance records must still be kept after the remaining information for the aircraft has been deleted.

Maintenance Records - Maintenance Records

As the consequence of a maintenance activity for an aircraft, other maintenance activities may be triggered for that aircraft. These subjunctives have their own maintenance records. CAB wants to record the relationships between maintenance records.

A maintenance record can have any number of (maintenance) subrecords. Conversely, a subrecord always belongs to one and only one maintenance record referred to as owning maintenance record. Maintenance subrecords do not have special characteristics. They are normal maintenance records and contain the same type of information as their owning maintenance records.

If a maintenance record is deleted, all its subrecords are deleted as well.

Business Constraints

The following constraints exist for the business object types and business relationship types that CAB wants to maintain in its database:



Student Notebook

Number of Engines on Aircraft

An aircraft cannot have more engines mounted than the aircraft model allows.

To be enforced when an engine is added to an aircraft.

The request to add an engine to an aircraft must be rejected if it violates the constraint.

Aircraft for Flight Must Be long to Aircraft Model for Leg

The aircraft assigned to a flight must belong to the aircraft model for the leg for the flight.

To be enforced when an aircraft is assigned to a flight or the aircraft assignment is changed.

The aircraft assignment must be rejected if it violates the constraint.

Also to be verified if the aircraft model for a leg is changed.

In this case, previous aircraft assignments for flights for the leg must be canceled and appropriate notifications must be given.

Captain and Copilot Must Be Different

A pilot cannot be captain and copilot for the same flight.

To be enforced when a pilot is assigned to a flight or the pilot assignment is changed.

The pilot assignment must be rejected if the pilot does not qualify for the flight.

Pilots for Flight M ust Have License for Ai rcraft Model for Leg

A pilot for a flight must have the license to fly the aircraft model for the leg for the flight.

To be checked when a pilot is assigned to a flight or when a previous pilot assignment is changed.

The pilot assignment is to be rejected if the pilot does not qualify for the flight.

Also to be verified if the aircraft model for a leg of an itinerary is changed.

In this case, previous pilot assignments for flights for the leg must be canceled and appropriate notifications must be given.

Only Trained Mechanics for Aircraft Maintenance

A mechanic can only service an aircraft if he/she has been trained for the appropriate aircraft model.




AP
To be checked when a mechanic is assigned to the next maintenance service for an aircraft.
The assignment is to be rejected if the mechanic has not been trained for the appropriate aircraft model.

Employees Cannot Be Pilots and Mechanics at the Same Time

An employee cannot be a pilot and a mechanic at the same time.

To be checked when a new pilot is added. Also to be checked when a new mechanic is added.

Only Aircraft Models With St art and Landing Rights for Legs

An aircraft model can only be assigned to a leg of an itinerary if it has start and landing rights for the airports of the leg.

To be checked when an aircraft model is assigned to a leg or when the aircraft model assignment is changed.

The aircraft model assignment must be rejected if it violates the constraint.



Student Notebook




AP
Appendix B. Checkpoint Solutions
Unit 1 - Relati onal Concepts

1.

The relational data model describes the conceptual representation of the data objects of relational databases and gives guidelines for their implementation.

2.

c

3.

False

4.

Fields are the columns for a particular row of a table. They are the actual receptacles for the data stored into a table.

5.

True

6.

True

7.

a, b, d

8.

The main reasons are:

- Identical rows cannot be modified or deleted individually.


© Copyright IBM Corp. 2000, 2002 Ap pendix B. Checkpoint Solutions B-1

Student Notebook

- To ensure that the database design is open-ended: Future application changes may require the retrieval, update, and deletion of particular rows.

9.

False

10.

True

11.

c


B-2 Relational DB Design © Copyright IBM Corp. 2000, 2002


AP
Unit 2 - Views and Result s During Database Design
1.

The problem statement for an application domain is a document describing the types of business objects for the application domain, the relationships between them, and the business constraints for both of them.

2.

c

3.

c, a, b

4.

False

5.

True

6.

b, a, a, c, b, b, c, b

7.

An entity-relationship model visualizes the business object types of the application domain, the relationships between them, and the business constraints for both of them.

8.

The data inventory is a description of the data elements, i.e., the elementary data, of the application domain.

9.

a, b



Student Notebook

10.

Logical data structures apply to processes or parts of them. They describe:

- The subset of the tables (of the database for the application domain) used by the process or the pertinent part of the process.

- How the process or the part of the process must logically navigate through the tables in order to accomplish its function.

11.

False




AP
Unit 3 - Problem Statement
1.

a, b, c, e, g

2.

The main sections of a problem statement are:

- An overview of the application domain. - A description of the business object types. - A description of the business relationship types. - A description of the business constraints.

3.

The overview section should:

- Describe what the application domain does.

- Identify the areas of the application domain to be implemented in the target database.

4.

b, c

5.

True

6.

A business relationship type represents a category of business relationships, with the same meaning and characteristics, between the objects of one or more business object types.

7.

For each business relationship type, the problem statement should:

- Contain a textual description of the business relationship type.

- Identify the business object types linked by the business relationship type.



Student Notebook

- Specify how many relationships of the same type an object can have.

- Describe if the business relationship type requires an object of a business object type for the business relationship type to have at least one relationship.

- If the objects having a relationship with an object must be deleted when the object is deleted.

8.

True

9.

Cascading business relationship type.

10.

A business constraint represents a restriction for the objects of business object types, for the relationships of business relationship types, or for a mixture thereof.

11.

For each business constraint, the problem statement should:

- Contain a textual description of the restriction that must be adhered to.

- Identify the business object types or business relationship types to which the restriction applies.

- Specify when the constraint is to be applied.

- Describe the action to be performed if the constraint is violated.

12.

True




AP
Unit 4 - Entity-R elationship Model
1.

The three major components of entity-relationship models are:

- Entity types - Relationship types - Constraints

2.

False

3.

An entity type is a conceptual unit representing a class of objects with the same meaning and characteristics about which information is to be stored and maintained.

An entity instance is an actual object belonging to an entity type.

4.

The entity key allows to uniquely identify the instances belonging to an entity type.

The minimum principle requires that all attributes of the entity key are necessary for the unique identification of the instances of the entity type. If an attribute is omitted, the remaining attributes no longer uniquely identify the instances of the entity type.

5.

True

6.

A relationship type is a conceptual association between:

- The entity instances, one each, of two not necessarily different entity types.

- The relationship instances, one each, of two not necessarily different relationship types.



Student Notebook

- The entity instances and relationship instances, one of each, of an entity type and a relationship type.

7.

True

8.

True

9.

False

10.

b, a, a, b, d, c

11.

The cardinalities for relationship type PASSENGER_has_SEAT are the following:

Cardinality for source: 0..1 or 1 Cardinality for target: 0..m or m

12.

A 1:1 relationship type is a relationship type with cardinalities ..1 at both ends of the relationship type.

A 1:m relationship type is a relationship type with cardinality ..1 at one end and cardinality ..m at the other end of the relationship type.

A m:m relationship type is a relationship type with cardinalities ..m at both ends of the relationship type.

13.

a. For relationship type r1, any number of instances of entity type B can be connected to an instance of entity type A.




AP
b. For relationship type r1, zero instances of entity type B need be connected to an instance of entity type A.
c. For relationship type r1, at most one instance of entity type A can be connected to an instance of entity type B.

d. For relationship type r1, zero instances of entity type A need be connected to an instance of entity type B.

e. For relationship type r2, at most one instance of entity type A can be connected to an instance of entity type C.

f. For relationship type r2, one instance of entity type A must be connected to every instance of entity type C.

14.

Since relationship type r1 has a source cardinality of 1, entity instance B3 cannot be connected to multiple instances of entity type A.

Since relationship type r2 has a source cardinality of 1..1, entity instance C1 of entity type C must be connected to one and only one instance of entity type A.

15.

The defining attributes and the relationship keys for relationship types r1 and r2 are:

Defining attributes for r1: Key of A and key of B

Relationship key for r1: Key of A (target cardinality of 1)

Defining attributes for r2: Key of r1 and key of C, i.e., key of A and key of C

Relationship key for r2: Key of r1 and key of C, i.e., key of A and key of C, since r2 is a m:m relationship type

16.

False



Student Notebook

17.

To be a dependent entity type, the entity type must fulfill the following requirements:

- A part of its key or its entire key must be equal to the key of another entity type or of a relationship type (referred to as parent entity type or relationship type, respectively).

- There must exist a relationship type between the parent entity type or relationship type and the dependent entity type so that:

• Each instance of the dependent entity type is, at all times, connected to one and only one parent instance.

• The dependent and parent instances interconnected are those with matching key values: The value of the appropriate key portion of the dependent entity instance must be equal to the key value of the parent instance.

18.

Owning relationship type r1 cannot have the instance (A1, A2.B1) because the value of the appropriate key portion for the entity instance of B is different from the key value for the instance of A.

19.

By means of dependent entity types.

20.

False

21.

True




AP
22.Deletion of C2
u Deletion of (C2, A3) for r1 u Deletion of A3 (controlling property)

u Deletion of (A3, B2) for r2 u Deletion of B2 (controlling property)

u Deletion of (C2, D3) for r3 u Deletion of ((A1, B1), (C2, D3)) for r4

Remaining Instances:

23.

True

24.

The components of a class structure are:

Supertype Subtypes Is-bundle

25.

The is-bundle is the set of _is_ relationship types connecting the supertype to its subtypes.

26.

b, c, a, d

Object Instances

A A1, A2,

B B1

C C1, C3

D D1, D2, D3

r1 (C1, A2)

r2 (A1, B1), (A2, B1)

r3 (C1, D1), (C1, D2)

r4 ((A1, B1), (C1, D2))


© Copyright IBM Corp. 2000, 2002 Appe ndix B. Checkpoint Solutions B-11

Student Notebook

27.

The instances of entity types and relationship types can be restricted by means of constraints.

28.

The three components of constraints are:

The constraining objects

The constrained objects

The rule specifying how the constraining objects restrict the instances of the constrained objects.

29.

The format of a constraint in the entity-relationship model is:

{ identifier [ : rule ] }




AP
Unit 5 - Data and Process Inventories
1.

a, b, e, f

2.

A data inventory should contain:

- A description of the abstract data types for the application domain.

- A description of the data elements and data groups for the application domain.

3.

From the application-domain perspective, a data element is an indivisible piece of data.

A data group consists of one or more related data elements and/or data groups and, thus, generally is not an indivisible piece of data.

4.

Data elements can be associated with standard data types or abstract data types. Abstract data types are an extension of standard data types. They can be tailored to the application domain. They describe the values that the data elements associated with them can assume and the operations that can be performed with them.

5.

For an abstract data type, you should provide:

- Its signature, i.e., its name and parameters. - The values that can be assumed. - The operations that can be performed.

6.

a, b, d, e



Student Notebook

7.

By associating data elements and data groups with the entity types using them as attributes, you can verify the completeness of the entity-relationship model for your application domain. If you cannot find an entity type for a data element or data group not belonging to a data group, the entity-relationship model is incomplete.

8.

The usual methods for establishing a data inventory are:

- Surveying the departments of expertise. - Screening existing data and programs. - Coupling the data and process inventories.

9.

Some of the problems in surveying the departments of expertise are:

- Communicative problems:

• The application domain expert may not be able to extract the proper information from the members of the departments of expertise.

• The members of the departments of expertise may not be able to communicate their thoughts and ideas.

• Due to workload pressure, the members of the departments of expertise may be reluctant to talk with the application domain expert about database related topics.

- In discussions, it is easy to forget something.

- You may obtain data elements and data groups not actually needed.

- It is a one-time effort. Later changes are not reflected in the data inventory.

10.

The principle behind coupling the data and process inventories is the following:

- When a business process is described or updated in the process inventory, the data elements and data groups it uses are identified or changed accordingly.




AP
- As a data element or data group for a business process is identified or changed, it is registered or changed in the data inventory. For a new data element or data group, or a new role, it is verified that the entity-relationship model contains the corresponding entity type. The entity type is named as part of the description of the data element or data group.
Consequently, the data inventory contains all data elements and data groups for the documented business processes and no superfluous data elements or data groups.

11.

b, d, e

12.

The description of a business process should contain the following items:

Title Purpose Input Textual description Formal description Output Data read Data written Others (such as window formats or listing formats)

13.

Data read for a business process are the data elements or data groups read internally during the execution of the business process.

For each data element or data group read, its name in the data inventory and all purposes it is read for should be described.

14.

For each step of the business process, you determine the entity types and relationship types of the entity-relationship model needed to access the data elements and data groups for the step. The entity types are the receptacles for the appropriate data. The relationship types are the paths for navigating from a piece of



Student Notebook

information to another logically related piece of information needed.

When verifying the entity-relationship model for a business process, you perform a walk through the entity-relationship model and determine the view needed for the business process.

15.

Process decomposition is an iterative, step-by-step decomposition of the application domain into groups of functionally related business processes. Each iteration decomposes the groups for the previous iteration into functionally related subsets until the groups cannot be broken down any further. The result is a process tree.

The purpose of process decomposition is to obtain the complete set of business processes for the application domain.




AP
Unit 6 - Tuple Types
1.

True

2.

False

3.

The cardinality for an attribute determines how many values the attribute must assume at least and can assume at most in the scope it is used.

If the attribute is used as direct component of the tuple type, the cardinality specifies how many values the attribute must assume at least and can assume at most for each tuple.

If the attribute is used as component of a composite attribute, the cardinality specifies how many values the attribute must assume at least and can assume at most for each value of the composite attribute.

4.

c, e

5.

False

6.

The tuple type for an entity type is established by compiling the data elements and data groups of the data inventory associated with the attributes of the entity type.

7.

Tuple types must not be established for:

- Owning relationship types.



Student Notebook

- m:m relationship types being the source of a relationship type with a minimum target cardinality of 1.

- m:m relationship types being the target of a relationship type with a minimum source cardinality of 1.

8.

The components of a composite attribute are indented.

9.

In the tuple type documentation, the role of a data element or data group for an attribute can be identified by means of the AS clause:

name of data element/group AS role name

10.

d

11.MAINTENANCE RECORD_belongs_to_MAINTENANCE RECORD

Maintenance Number, PK Maintenance Number AS Owner

12.

a, d, f

13.

The Normal Forms describe states or quality levels for the tuple types. The higher the Normal Form of a tuple type, the more stable the tuple type is, the fewer data inconsistencies are possible, and the less redundant information it contains.

14.

The resulting tuple types no longer contain repeating groups, i.e., all attributes can assume at most one value.




AP

15.

The attributes for repeating groups have a maximum cardinality higher than 1. This includes a maximum cardinality of * meaning that the appropriate attribute can assume any number of values within its scope.

16.

Generally, in the entity-relationship model, you need:

- A new dependent entity type.

- A new owning relationship type interconnecting the new dependent entity type and the entity type/relationship type for the original tuple type.

17.

False

18.

True

19.

Generally, in the entity-relationship model, you need:

- A new entity type.

- A new relationship type interconnecting the new entity type and the entity type/relationship type for the original tuple type.

20.

If the data groups the attributes for a tuple type are based upon have been established properly, they contain all attributes (and only those) that, during normalization, must be moved together to a new tuple type.

21.

True



Student Notebook




AP
Unit 7 - From T uple Types to Tables
1.

Tuple types are translated into tables as follows:

- Each tuple type becomes a table.

- Each elementary attribute becomes a column.

- Each elementary attribute of the tuple type's primary key becomes a column of the table's primary key.

2.

Tuple types with always corresponding primary key values can be merged.

3.

A tuple type whose primary key values always are a subset of the primary key values of another tuple type can be imbedded in the other tuple type if the following condition is met: For each potentially imbedded tuple, at least one of its nonkey attributes has a value.

4.

True

5.

For T1 through Tn to be a perfect decomposition of T, the following condition must be satisfied as well:

At all times, each primary key value of T must occur in one and only one of the tuple types T1 through Tn.

6.

The following are some reasons for not combining tuple types:

- The tuple types have nothing to do with each other.

- The tuple types are only processed together by business processes that are not performance-critical.



Student Notebook

- Other tuple types are referentially dependent on the tuple type being eliminated.

- Limitations of the target database management system do not allow you to combine the tuple types.

7.

Some typical limitations for relational database management systems are:

- The rows must fit entirely into a single pages of a chosen size. This limits the row size.

- The maximum number of rows per page is limited.

- The maximum number of columns per page is limited.

- The maximum size of a table is limited.

8.

True

9.

True

10.

False

11.

True

12.

False

13.

f, a, c, b, a, d, e, a, f, c, g, f, b, e




AP
14.
False

15.

False

16.

System default values are system-provided, predefined, default values for the various data types. They are independent of columns.

User default values are default values you define for specific columns. As user default for a column, any value can be chosen that is compatible with the data type for the column.

17.

You can provide your own default value for a column by specifying the value in the WITH DEFAULT clause for the column.

18.

True

19.

False

20.

External user defined functions are based on programs written by you. Sourced user defined functions are based on existing built-in functions or user defined functions.

21.

True

22.

b, c, a



Student Notebook

23.

Check constraints allow you to restrict the values of columns beyond the values permitted by the data types of the columns.

24.

False

25.

A trigger is a set of actions to be performed when a specific event occurs.

26.

False

27.

True

28.

A trigger can be activated before the changes for the row or SQL statement are applied or after they have been applied.

29.

a, e, f

30.

a, b, c, d, e

31.

True

32.

True




AP
Unit 8 - Integrity Rules
1.

The four basic types of integrity to be maintained for a data base are:

- Referential integrity - Domain integrity - Redundancy integrity - Constraint integrity

2.

A foreign key is an ordered set of columns whose values are, at all times, a subset of the values of a parent key of the same or another table.

3.

True

4.

e, b, d, a, c

5.

NO ACTION checks for orphans after the deletion of the rows of the parent table and rejects the request if orphans are detected.

RESTRICT checks for parent rows before the deletion of the rows of the parent table and rejects the request if parent rows are found.

6.

True

7.

a, b



Student Notebook

8.

The deletion of a parent row fails if:

- Another referential constraint with delete rule NO ACTION or RESTRICT prevents the deletion of the parent row.

- Another referential constraint with delete rule NO ACTION or RESTRICT for which the dependent table is the parent table prevents the deletion of a dependent row.

9.

You need an after trigger for the table for the relationship type. The trigger must be activated for each deletion of a row for the relationship type and must delete the row for the appropriate source instance.

10.

Table T is delete-connected to table T1 if the deletion of a row of T1 requires that rows of T are accessed.

11.

False

12.

True

13.

True

14.

For referential cycles, the following restrictions exist:

- For a cycle of two or more tables, at least two delete rules must be different from CASCADE.

- For a self-referencing constraint, the delete rule must be NO ACTION or CASCADE.




AP
15.
False

16.

The purpose of a referential structure is to provide an overview of the referential constraints for the tables of an application domain or a subset thereof.

17.

True

18.

A double-headed arrow in a referential structure indicates that a parent key value may occur more than once as foreign key value in the dependent table.

19.

Domain integrity requires that the values of the columns for the tables are correct. This means that:

- The values belong to the values supported by the abstract data types for the data elements for the columns.

- The values adhere to domain restrictions for the data elements for the columns.

- The values observe length restrictions for the data elements for the columns.

20.

The three major causes for the redundancy of data are:

- Violations of the Second Normal Form or Third Normal Form

- Multiple copies of columns or tables

- Derivable data

21.

False



Student Notebook

22.

You can ensure the correctness of derivable data by:

- Not storing them and deriving them each time they are needed.

- Triggers reevaluating and storing the derivable data each time data affecting the derivable data are inserted, updated, or deleted.

23.

For constraint integrity, all business constraints of the application domain must be observed.

24.

The main ingredients for achieving constraint integrity are triggers and user defined functions. Sometimes, unique indexes or referential constraints can be used.




AP
Unit 9 - Indexes
1.

The main purpose of an index is to improve performance when the locating of a row would require the scanning of the rows of the table.

2.

True

3.

True

4.

An index is a dense index if each key value has an index entry in the lowest index level.

5.

False

6.

At most one.

7.

c, a, b

8.

Plain unique index can be used for:

- The primary key of a table.

- The foreign key resulting from merging the tuple type for a 1:1 relationship type.



Student Notebook

9.

Unique-where-not-NULL indexes can be used for the foreign key resulting from imbedding the tuple type for a 1:1 relationship type.

10.

If you have a clustering index for a table, the database management system attempts to store the rows of the table in such a way that the physical sequence of the data pages agrees with the logical order implied by the index.

11.

c

12.

From a database design perspective, you should establish an index for:

- Each primary key. - Each foreign key.




AP
Unit 10 - Logica l Data Structures
1.

The two major inputs for the development of the logical data structures are:

- The tables for the application domain. - The referential structure for the application domain.

2.

The main purposes of logical data structures are to identify:

- The columns (and the tables containing the columns) corresponding to the data elements used by the business processes.

- How the business processes can navigate, with the data found, from one table to the next.

3.

True

4.

False

5.

b

6.

The components of a logical data structure are:

The structure diagram. The path summary. The table summary.

7.

The structure diagram for a logical data structure illustrates the paths interconnecting the tables of the logical data structure.



Student Notebook

8.

For each path of the structure diagram, the path summary specifies the source table, the target table, and the interconnected columns.

9.

For each use of a table of the logical data structure, the table summary specifies the columns needed.

10.

False

11.

True

12.

Views provide data security, ease of use, resilience against database changes, and freedom to change the table definitions.



Student NotebookV1.2.2.3

IX
Index
Numerics1:1 relationship types 4-431:m relationship types 4-43

Aabstract data types 5-9

example 5-12, 5-14, 5-15implementation considerations 7-49operations 5-11sample implementation 7-66signature 5-10values 5-10

attributes 4-9components 4-11composite attributes 4-11definition 4-9elementary attributes 4-11name 4-11properties 4-10value 4-11

Bbalanced trees 9-8

searching via an index 9-9basic entity types 4-15built-in data types 7-38

BIGINT 7-39CHARACTER 7-39character strings 7-39CLOB 7-39DATE 7-39datetime data types 7-39DBCLOB 7-39DECIMAL 7-39design considerations 7-40DOUBLE 7-39GRAPHIC 7-39INTEGER 7-39NUMERIC 7-39numeric data types 7-39REAL 7-39SMALLINT 7-39TIME 7-39TIMESTAMP 7-39VARCHAR 7-39VARGRAPHIC 7-39

bundle cardinality 4-79business constraints 3-16

action if violated 3-16affected constructs 3-16

Course materials may not be repwithout the prior written

© Copyright IBM Corp. 2000, 2002

condition 3-16textual description 3-16

business object types 3-6business process 5-42

data read 5-45data written 5-45formal description 5-43input 5-42output 5-44purpose 5-42sample business process 5-46textual description 5-43title 5-42

business relationship types 3-11

Ccandidate keys 4-12cardinalities 4-41

example 4-44, 4-46CASCADE 8-15, 8-17character strings 7-39

CHARACTER 7-39CLOB 7-39DBCLOB 7-39GRAPHIC 7-39VARCHAR 7-39VARGRAPHIC 7-39

check constraints 7-59documentation 7-85examples 7-61

class structure 4-76subtypes 4-76supertype 4-76

clustering indexes 9-13locating insertion point 9-15, 9-17partitioning indexes 9-19purpose 9-13sample insertion 9-15, 9-17

column attributes 7-41default values 7-45

column functions 7-56columns 1-4combining tuple types 7-13

considerations 7-26decomposition of super tuple types 7-23imbedding detail tuple types 7-18merging partial tuple types 7-13

Come Aboard A-1CAB A-1

composite attributes 4-11components 4-11

roduced in whole or in part permission of IBM.

Index X-1

Student Notebook

conceptual view 2-4conditional relationship types 4-43constraint integrity 8-6, 8-57

definition 8-57example 8-59, 8-61, 8-64maintaining integrity 8-57

constraint summary 8-46constraints 4-88

constrained object 4-88constraining object 4-88definition 4-88examples 4-92representation in entity-relationship model 4-90rule 4-88

controlling property 4-69cascading effect 4-71for nondefining attributes 4-73

conversion of tuple types into tables 7-7problems with one-to-one conversion 7-11

coupling data and process inventories 5-36covering subtype set 4-81criteria for entity types 4-17

Ddata element 5-6

(textual) description 5-18cardinality for usage 5-8data type 5-18domain 5-18example 5-22, 5-25, 5-26, 5-30homonyms 5-18items 5-17lengths 5-18name 5-17owning data groups 5-20owning entity types 5-21synonyms 5-18type 5-18usage 5-8

data elements 2-6data group 5-6

(textual) description 5-18cardinality for usage 5-8components 5-6example 5-24, 5-27, 5-28homonyms 5-18items 5-17name 5-17owning data groups 5-20owning entity types 5-21synonyms 5-18type 5-18usage 5-8

data inventory 5-4coupling data and process inventories 5-36

data element 5-6data group 5-6in design methodology 5-4items for data elements 5-17items for of data groups 5-17methods 5-31purpose 5-6responsibility 5-8review of existing data and programs 5-34survey of departments 5-32

data types 5-9abstract data types 5-9standard data types 5-9

datetime data types 7-39DATE 7-39TIME 7-39TIMESTAMP 7-39

decomposition of super tuple types 7-23partial decomposition 7-23perfect decomposition 7-23

default values 7-45selecting default values 7-47system default values 7-45user default values 7-46

defining attributes 4-48example 4-50

delete connection 8-34restrictions 8-34, 8-36via multiple paths 8-34

delete rules 8-6delete rules (referential integrity) 8-13

an imbed case 8-32CASCADE 8-15delete-connected tables 8-34determining via ER model 8-18for 1:1 relationship types 8-25for 1:m relationship types 8-24for dependent entity types 8-18for m:m relationship types 8-20NO ACTION 8-14referential cycles 8-36RESTRICT 8-14SET NULL 8-14

delete-connected tables 8-34denormalization 7-30dense indexes 9-7dependent entity types 4-58

characteristics 4-60owning relationship type 4-60parent entity type 4-60parent relationship type 4-60

dependent row 8-9dependent table 8-8design methodology 2-12detail tuple types 7-18

imbedding detail tuple types 7-18


X-2 Relational DB Design © Copyright IBM Corp. 2000, 2002

Student Notebook

documentation 7-81check constraints 7-85column-related items 7-89, 7-90columns 7-89table-related items 7-87tables 7-87triggers 7-90user defined distinct types 7-82user defined functions 7-83

documentation of tuple types 6-19cardinalities for attributes 6-20identification of primary key 6-20

domain integrity 8-5, 8-48value integrity 8-5

Eelementary attributes 4-11enterprise-wide entity-relationship model 4-108

consolidation of submodels 4-110entity instances 4-9

definition 4-9representation 4-13

entity key 4-12candidate keys 4-12minimum principle 4-12

entity types 4-8advices 4-19attribute representation 4-13basic entity types 4-15class structure 4-76corresponding tuple types 6-9criteria for entity types 4-17definition 4-8dependent entity types 4-58determining entity types 4-15entity key 4-12name 4-10parent entity type 4-60properties 4-10representation 4-13standard representation 4-13subtypes 4-76supertypes 4-76

entity-relationship model 4-6attributes 4-9basic considerations 4-6constraints 4-88enterprise-wide entity-relationship model 4-108entity instances 4-9entity types 4-8position in design methodology 4-6relationship instances 4-24relationship types 4-24sample view of entity-relationship model 4-104splitting into pages 4-102

exclusive subtype set 4-80external functions 7-55

Ffields 1-4first normal form 6-30

correction of entity-relationship model 6-34, 6-37definition 6-30example 6-30, 6-35instance example 6-33repeating groups 6-30solution 6-31violation 6-30

foreign key 8-8fourth normal form 6-54

definition 6-54multivalued dependency 6-54sample tuple type 6-56solution 6-60violation 6-58

Ggeneralization 4-77

Hhorizontal splitting 7-35

Iimbedding detail tuple types 7-18

determining detail tuple types 7-20implementation-independent architecture 2-4indexes 9-4

balanced trees 9-8clustering indexes 9-13dense indexes 9-7documentation 9-25in design process 9-4leaf pages 9-8nonleaf pages 9-8nonunique indexes 9-12partitioning indexes 9-19plain unique indexes 9-11purpose of an index 9-5root page 9-8structure 9-7unique indexes 9-11unique-where-not-null indexes 9-12use of indexes 9-21

insert rules 8-6insert rules (referential integrity) 8-11integrity 8-5

constraint integrity 8-6, 8-57domain integrity 8-5, 8-48


© Copyright IBM Corp. 2000, 2002 Index X-3

Student Notebook

integrity rules 8-6redundancy integrity 8-6, 8-50referential integrity 8-5

integrity rules 8-6delete rules 8-6insert rules 8-6update rules 8-6

inverse direction 4-26is-bundle 4-77

bundle cardinality 4-79

Jjoining tables 1-9

Kkey 8-7

Lleaf pages 9-8limitations for target database management system 7-28linkage of tables 1-9logical data structures 10-4

an alternate representation 10-18example 10-11, 10-22in design process 10-4interrelationship to business processes 10-20path summary 10-11purpose 10-5responsibilities 10-7sample business process 10-9sample path summaries 10-26sample path summary 10-17sample structure diagrams 10-23sample table summaries 10-26sample table summary 10-17structure diagram 10-11table summary 10-11

logical view 2-4

Mm:m relationship types 4-43mandatory relationship types 4-43merging partial tuple types 7-13

determining partial tuple types 7-15multiplicities 4-41

NNO ACTION 8-14, 8-16nondefining attributes 4-62

controlling property 4-73sample instance diagram 4-65

sample usage 4-66nonleaf pages 9-8nonunique indexes 9-12normal forms 6-28

first normal form 6-30fourth normal form 6-54second normal form 6-39third normal form 6-43

normalization 6-28first normal form 6-30fourth normal form 6-54normal forms 6-28problems with tuple types 6-28second normal form 6-39third normal form 6-43

NULL attribute 7-41nullable columns 7-41

relationship to cardinalities 7-43numeric data types 7-39

BIGINT 7-39DECIMAL 7-39DOUBLE 7-39INTEGER 7-39NUMERIC 7-39REAL 7-39SMALLINT 7-39

Ooptional relationship types 4-43owning relationship type 4-60

Pparent entity type 4-60parent key 8-7parent relationship type 4-60parent row 8-9parent table 8-8partial tuple types 7-13

merging partial tuple types 7-13partitioning indexes 9-19path summary 10-11

description 10-11example 10-17

physical pointers 1-9plain unique indexes 9-11primary direction 4-26primary key 6-7

for tuple types 6-7for tuple types for entity types 6-9for tuple types for relationship types 6-11identification in tuple types 6-20

problem statement 3-4business constraints 3-16business object types 3-6



Student Notebook

business relationship types 3-11contents 3-6overview section 3-6purpose 3-4responsibility 3-4sample business constraint 3-18sample business object type 3-9sample business relationship type 3-13sample overview section 3-8

process decomposition 5-58process tree 5-59sample process decomposition 5-60sample process tree 5-60

process inventory 5-4business process 5-42contents business process 5-42in design methodology 5-4process decomposition 5-58process tree 5-59purpose 5-40responsibilities 5-40sample business process 5-46sample process decomposition 5-60sample process tree 5-60

process-independent architecture 2-4purpose of problem statement 3-4

Rredundancy integrity 8-6, 8-50

ensuring integrity 8-50example for derivable data 8-55forms of redundancy 8-50trigger for update 8-53

referential constraint 8-8constraint summary 8-46definition 8-38documentation 8-39

referential cycles 8-36restrictions 8-36

referential integrity 8-5composite key 8-7constraint summary 8-46definition of constraints 8-38delete connection 8-34delete rules 8-13delete-connected tables 8-34dependent row 8-9dependent table 8-8documentation 8-39foreign key 8-8insert rules 8-11key 8-7parent key 8-7parent row 8-9parent table 8-8

referential constraint 8-8referential cycles 8-36referential structure 8-42self-referencing constraint 8-9self-referencing table 8-9terminology 8-7update rules 8-16

referential structure 8-42relational data model 1-2relations 6-6

refer to tuple types 6-6relationship instance diagram 4-29relationship instances 4-24, 4-35

general definition 4-35restricted definition 4-24

relationship key 4-48example 4-50minimum principle 4-48

relationship type names 4-27relationship type on owning relationship type 4-68relationship type on relationship type 4-36relationship type versus attribute 4-38relationship types 4-24, 4-35

1:1 relationship types 4-431:m relationship types 4-43cardinalities 4-41conditional relationship types 4-43controlling property 4-69corresponding tuple types 6-11defining attributes 4-48directions 4-26general definition 4-35inverse direction 4-26m:m relationship types 4-43mandatory relationship types 4-43multiple for same entity types 4-30multiplicities 4-41name for relationship type 4-27names for directions 4-27naming convention 4-27nondefining attributes 4-62optional relationship types 4-43owning relationship type 4-60parent relationship type 4-60primary direction 4-26relationship instance diagram 4-29relationship key 4-48relationship type on owning relationship type 4-68relationship type on relationship type 4-36relationship type versus attribute 4-38representation 4-26restricted definition 4-24roles 4-51sample relationship type 4-29, 4-32source for directions 4-27



Student Notebook

source for relationship type 4-27target for directions 4-27target for relationship type 4-27unary relationship types 4-32

repeating groups 6-30representation of relationship types 4-26responsibility for problem statement 3-4RESTRICT 8-14, 8-17results of conceptual view 2-6results of logical view 2-10results of storage view 2-8retrieval order of columns 1-8retrieval order of rows 1-8review of existing data and programs 5-34roles 4-51

example 4-51root page 9-8rows 1-4

Ssample business process 5-46

data read 5-56input 5-46output 5-48purpose 5-46textual description 5-47verification of ER model 5-50

sample problem statement A-1business constraints A-5business object types A-1business relationship types A-3CAB A-1Come Aboard A-1overview A-1

scalar functions 7-56second normal form 6-39

correction of entity-relationship model 6-41definition 6-39example 6-40solution 6-41violation 6-40

self-referencing constraint 8-9self-referencing table 8-9SET NULL 8-14, 8-17source 4-27sourced functions 7-56specialization 4-77standard data types 5-9steps during conceptual view 2-6steps during logical view 2-10steps during storage view 2-8storage view 2-4structure diagram 10-11


subtype set 4-80covering 4-81exclusive 4-80

subtypes 4-76bundle cardinality 4-79is-bundle 4-77representation 4-76specialization 4-77

super tuple types 7-23decomposition of super tuple types 7-23

supertype 4-76bundle cardinality 4-79generalization 4-77is-bundle 4-77representation 4-76

survey of departments 5-32system default values 7-45

Ttable functions 7-56table summary 10-11


tables 1-4, 7-7built-in data types 7-38check expressions 7-59column attributes 7-41conversion of tuple types into tables 7-7documentation 7-87token translation tables 7-78

target 4-27third normal form 6-43

correction of entity-relationship model 6-48definition 6-43example 6-44, 6-50functional dependency 6-43instance example 6-47solution 6-45violation 6-44

token translation tables 7-78an alternative 7-79

triggers 7-62activation time 7-63after triggers 7-63before triggers 7-63examples 7-69granularity 7-63prerequisite conditions 7-63remarks 7-64triggered actions 7-63triggering operations 7-63

tuple types 2-8, 6-4characteristics 6-7conversion into tables 7-7decomposition of super tuple types 7-23



Student Notebook

definition 6-5denormalization 7-30documentation 6-17for entity types 6-9for relationship types 6-11horizontal splitting 7-35imbedding detail tuple types 7-18in design methodology 6-4merging partial tuple types 7-13name 6-7none for owning relationship type 6-13none for some m:m relationship types 6-14primary key 6-7relations 6-6renaming attributes 6-21required for Come Aboard 6-16roles 6-21tuples 6-5vertical splitting 7-33

tuples 6-5

Uunary relationship types 4-32unique indexes 9-11

plain unique indexes 9-11unique-where-not-null indexes 9-12

uniqueness of columns 1-6uniqueness of rows 1-6unique-where-not-null indexes 9-12update rules 8-6update rules (referential integrity) 8-16

CASCADE 8-17NO ACTION 8-16RESTRICT 8-17SET NULL 8-17

updated maintenance view 8-41user default values 7-46user defined distinct types 7-51

documentation 7-82example 7-53source data type 7-51

user defined functions 7-55column functions 7-56definition 7-57documentation 7-83external functions 7-55invocation 7-57scalar functions 7-56sourced functions 7-56table functions 7-56

Vvalues 1-4vertical splitting 7-33

views 10-29characteristics 10-29uses 10-31



Student Notebook



V1.2.2

backpg

Documents

Student Manual Cf182stud