73
Database Design

Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

Database Design

wwwprincexmlcom
Prince - Non-commercial License
This document was created with Prince a great way of getting web content onto paper

Contents

Chapter 1 Before the Advent of Database Systems111 File-based approach for banking system 112 Data Redundancy 113 Data Isolation 214 Integrity problems215 Security problems216 Concurrency ndash access anomalies217 Database Approach 318 Role of databases in Business 319 Meaning of Data 3

Chapter 2 Fundamental Concepts 5Chapter 3 Characteristics and Benefits of a Database 731 Self-Describing Nature of a Database System 732 Insulation between Program and Data 733 Support multiple views of data 734 Sharing of data and Multiuser system 735 Control Data Redundancy 836 Data Sharing 837 Enforcing Integrity Constraints 838 Restricting Unauthorised Access 839 Data Independence 8310 Transaction Processing 9311 Providing multiple views of data 9312 Providing backup and recovery facilities 9313 Managing information 9

Chapter 4 Types of Database Models 1141 High-level Conceptual Data models 11

411 Record-based Logical Data models11

Chapter 5 Data Modelling 1351 Degree of Data Abstraction 1352 External models 1453 Conceptual model 1454 Internal model 1455 Physical model 15

551 Data Abstraction Layer 15552 Schemas 16553 Logical and Physical Data Independence 17554 Logical data independence 17555 Physical data independence 17

Chapter 6 Classification of Database Systems 1861 Based on data model 1862 Based on the number of users 1863 Based on the ways database is distributed 18

631 Centralized Systems18

632 Distributed database system19633 Homogeneous distributed Database Systems 19634 Heterogeneous distributed Database Systems 19

Chapter 7 The Relational Data Model 2071 Fundamental concepts in Relational Data Model 20

711 Relation20712 Table 20713 Column 21714 Domain 21715 Records22

72 Degree 2373 Properties of Tables 23

731 Terminology 23

Chapter 8 Entity Relationship Model2581 Entity Entity Set and Entity Type 25

811 Types of Entities26812 Attributes27813 Types of Attributes 27814 Keys 29

8141 Candidate key 298142 Composite key 298143 Primary key 308144 Secondary key308145 Alternate key308146 Foreign key308147 Nulls 31

815 Entity Types 328151 Existence Dependency 328152 Weak Entities 328153 Strong Entities 32

816 Relationships 338161 1M relationship 338162 11 relationship338163 11 Example338164 MN relationships 348165 MN relationships 34

817 Unary Relationships (recursive)35818 Ternary Relationships 35819 Relationship Strength 36

Chapter 9 Integrity Rules and Constraints 3791 Domain Integrity 3792 Referential integrity Examples 3793 Referential Integrity in MS Access 3894 Referential Integrity using Transact SQL (MS SQL Server) 3895 Foreign Key Rules 39

951 Business Rules 39

96 Cardinality 40961 Relationship Participation 40

97 Optional 4198 Mandatory 41

981 Relationship Types 43

Chapter 10 ER Modelling 44101 Relational Design and Redundancy 44102 Insertion anomaly 45103 Update anomaly 45104 Deletion anomaly 45

Chapter 11 Functional Dependencies48111 Rules of Functional Dependencies 48112 Inference Rules 49

1121 Axiom of reflexivity 491122 Axiom of augmentation501123 Axiom of transitivity 50

113 Additional rules 501131 Union 501132 Decomposition 51

114 Dependency Diagram 51

Chapter 12 Normalization 52121 Normal Forms52

1211 School database 5312111 Process for 1NF 5312112 Update anomalies in 1st normal form 53

122 2nd Normal Form531221 Process for 2NF 541222 Update anomalies in 2nd normal form54

123 3rd Normal Form541231 Remove Transitive Dependencies54

12311 Update anomalies in 3rd normal form 55

1232 Boyce-Codd Normal Form (BCNF)561233 BCNF Example 2 58

12331 Normalization and Database Design 59

Chapter 13 Database Development Process 60131 SDLC ndash Waterfall 60132 Database Life Cycle 61133 Requirements gathering 62134 Analysis 63135 Logical Design 63136 Implementation 65137 Realising the design 66138 Populating the database 66139 Guidelines For Developing An ER Diagram 67

Chapter 14 Database Users 68141 End users 68142 Application Programmers 68143 Database Administrators 68

Chapter 1 Before the Advent ofDatabase Systems

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One way to keep information on a computer is to store it in permanent files Acompany system has a number of application programs each of them is defined tomanipulate data files These application programs have been written on request ofthe users in the organization New application will be added to the system as the needarises The system just described is called the file-based system

Consider a traditional banking system which uses the file-based system to manage theorganizationrsquos data in the picture below As we can see there are differentdepartments in the Bank each of them has their own applications that manage andmanipulate different data files For banking systems the programs can be the ones todebit or credit an account find the balance of an account add a new mortgage loanor generate monthly statements etc

Fig 11

httpcnxorg contentm28150latest (httpcnxorg20contentm28150latest)

11 File-based approach for banking systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Keeping organizational information in this approach has a number of disadvantagesincluding

12 Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Since files and applications are created by different programmers of variousdepartments over long periods of time it might lead to several problems such as

1

bull inconsistency in data format

bull the same information may be kept in several different place (files)

bull data inconsistency which means various copies of the same data are conflicting this wastes stor age space and duplicates effort

13 Data IsolationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull It is difficult for new applications to retrieve the appropriate data which might bestored in various files

14 Integrity problemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Data values must satisfy certain consistency constraints that are specified in theapplication programsbull

bull It is difficult to make changes to the programs to enforce new constraints

15 Security problemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull There are constraints regarding accessing privileges

bull Application requirements are added to the system in an ad-hoc manner so it isdifficult to enforce constraints

16 Concurrency ndash access anomaliesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Data may be accessed by many applications that have not been coordinatedpreviously so it is not easy to provide a strategy to support multiple users toupdate data simultaneously

These difficulties have prompted the development of a new approach in managinglarge amount of organizational information ndash the database approach

Chapter 1 2

17 Database ApproachAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Databases and database technology play an important role in most of areas wherecomputers are used including business education medicine etc To understand thefundamentals of database systems we start by introducing the basic concepts in thisarea

18 Role of databases in BusinessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Everybody uses a database in some way maybe just to store information about theirfriends and family That data might be written down or stored in a computer using aword processing program or it could even be saved in a spreadsheet However thebest place for it would be the Database Management Software (DBMS) The DBMS is apower software tool that allows you to store manipulate and retrieve data in a varietyof different ways

Most companies keep track of customer information by storing it in a database Thatdata may be customers employees products orders or anything that may assist thebusiness in their operations

19 Meaning of DataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data is factual information such as measurements or statistics about objects andconcepts We use it for discussion or calculation It could be a person a place anevent an action any one of a number of things A single fact is an element of data ora data element

If data is information and information is what we are all in the business of workingwith you can start to see where you may be storing it

bull Filing cabinets

bull Spreadsheets

bull Folders

bull Ledgers Lists

bull Piles of papers on your desk

They all store information and so too does a database Because of the mechanicalnature of databases they have terrific power to manage and process the information

3

they hold This can make the information they house much more useful for your workWith this understanding of data we can start to see how a tool with the capacity tostore a collection of data and organize it especially for rapid search retrieval andprocessing might make a difference to how we can use data This is all aboutmanaging information Source http cnxorgcontentm28150latest (httphttp20cnxorgcontentm28150latest)

Chapter 1 4

Chapter 2 Fundamental ConceptsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A Database is a shared collection of related data which is used to support the activitiesof a particular organization A database can be viewed as a repository of data that isdefined once and then is accessed by various users A database has the followingproperties

bull It is a representation of some aspect of the real world or perhaps a collection ofdata elements (facts) representing real-world information

bull A database is logical coherent and internally consistent

bull A database is designed built and populated with data for a specific purpose

bull Each data item is stored in a field

bull A combination of fields makes up a table For example an employee tablecontains fields that contain data about an individual employee

A Database Management System (DBMS) is a collection of programs that enable usersto create maintain databases and control all the access to the databases The primarygoal of the DBMS is to provide an environment that is both convenient and efficientfor users to retrieve and store information

Fig 21

httpscnxorgcontentm28150latest With the database approach we can have thetraditional banking system as shown in the following image

5

Fig 22

httpscnxorgcontentm28150latest

Chapter 2 6

Chapter 3 Characteristics andBenefits of a Database

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a number of characteristics that distinguish the database approach with thefile-based approach

31 Self-Describing Nature of a Database SystemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A Database System contains not only the database itself but also the descriptions ofdata structure and constraints (meta-data) This information is used by the DBMSsoftware or database users if needed This separation makes a database systemtotally different from the traditional file-based system in which the data definition is apart of application programs

32 Insulation between Program and DataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the file based system the structure of the data files is defined in the applicationprograms so if a user wants to change the structure of a file all the programs thataccess that file might need to be changed as well On the other hand in the databaseapproach the data structure is stored in the system catalog not in the programsTherefore one change is all thatrsquos needed

33 Support multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A view is a subset of the database which is defined and dedicated for particular usersof the system Multiple users in the system might have different views of the systemEach view might contain only the data of interest to a user or a group of users

34 Sharing of data and Multiuser systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A multiuser database system must allow multiple users access to the database at thesame time As a result the multiuser DBMS must have concurrency control strategies

7

to ensure several users access to the same data item at the same time and to do so ina manner that the data will always be correct ndash data integrity

35 Control Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the Database approach ideally each data item is stored in only one place in thedatabase In some cases redundancy still exists so as to improve system performancebut such redundancy is controlled and kept to minimum

36 Data SharingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The integration of the whole data in an organization leads to the ability to producemore information from a given amount of data

37 Enforcing Integrity ConstraintsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

DBMSs should provide capabilities to define and enforce certain constraints such asdata type data uniqueness etc

38 Restricting Unauthorised AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Not all users of the system have the same accessing privileges DBMSs should providea security subsystem to create and control the user accounts

39 Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

System data (Meta Data) descriptions are separated from the application programsChanges to the data structure is handled by the DBMS and not embedded in theprogram

Chapter 3 8

310 Transaction ProcessingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The DBMS must include concurrency control subsystems to ensure that several userstrying to update the same data do so in a controlled manner The results of anyupdates to the database must maintain consistency and validity

311 Providing multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A view may be a subset of the database Various users may have different views of thedatabase itself Users may not need to be aware of how and where the data they referto is stored

312 Providing backup and recovery facilitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If the computer system fails in the middle of a complex update process the recoverysubsystem is responsible for making sure that the database is restored to the stage itwas in before the process started executing

313 Managing informationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Managing information means taking care of it so that it works for us and is useful forthe work we are doing The information we collect is no longer subject to ldquoaccidentaldisorganizationrdquo and becomes more easily accessible and integrated with the rest ofour work Managing information using a database allows us to become strategic usersof the data we have

We often need to access and re-sort data for various uses These may include

bull Creating mailing lists

bull Writing management reports

bull Generating lists of selected news stories

bull Identifying various client needs

The processing power of a database allows it to manipulate the data it houses so itcan

9

bull Sortbull Matchbull Linkbull Aggregatebull Skip fieldsbull Calculatebull Arrange

So a database tracks data finding the bits you need and processing them in the wayyou arrange for them to be processed

bull Because of the versatility of databases we find them powering all sorts ofprojects A database can be linked to

bull A web site that is capturing registered users

bull A client tracking application for social service organisations

bull A medical record system for a health care facility

bull Your personal address book in your e-mail client

bull A collection of word processed documents

bull A system that issues airline reservations

Source httpcnxorgcontentm28150latest (httpcnxorgcontentm28150latest)

Chapter 3 10

Chapter 4 Types of Database Models

41 High-level Conceptual Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Provide concepts that are close to the way people perceive data to present the data Atypical example of this type is the entity relationship model which uses main conceptslike entities attributes relationships An entity represents a real-world object such asan employee a project An entity has some attributes which represents properties ofentity such as employeersquos name address birthdate A relationship represents anassociation among entities for example an employee works on many projects Therelationship exists between employee and project

411 Record-based Logical Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Provide concepts that can be understood by the user but not too far from the waydata is stored in the computer Three well-known data models of this type arerelational data model network data model and hierarchical data model

bull The Relational model represents data as relations or tables

bull The Network model represents data as record types and also represents a limitedtype of one to many relationship called set type

Fig 41 httpcnxorgcontentm28150latest

The Hierarchical model represents data as a hierarchical tree structures Eachhierarchy represents a number of related records Here is the schema in hierarchicalmodel notation

11

Fig 42 httpcnxorgcontentm28150latest

Chapter 4 12

Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to

bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)

bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )

bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)

The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database

51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction

13

The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS

52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Is the userrsquos view of the database

bull contains multiple different external views

bull is closely related to the real world as perceived by each user

53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull These models provide a flexible data structuring capabilities

bull Community view the logical structure of the entire database

bull Contains lsquoWhatrsquo data is stored bull

bull Shows relationships among data

bull constraints

bull semantic information (business rules ndash discussed later)

bull security amp integrity information

bull A database is considered as a collection of entities (objects) of variouskinds

bull Basis for identification and high-level description of main data objects avoidsdetails

bull They are database independent It does not matter what database you will beusing

54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Database is considered as a collection of fixed ndash size record

bull This model is closer to the physical level or file structure

bull Is a representation of database as seen by the DBMS

Chapter 5 14

bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model

bull The entities in the conceptual model are mapped to the tables in the relationalmodel

bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model

55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Physical representation of the database

bull Lowest level of abstractions bull

bull It is how the data is stored dealing with

bull Run-time performance

bull Storage utilization amp compression

bull File organization amp access methods

bull Data encryption

bull Physical level ndash managed by OS

bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory

551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model

The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother

The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)

As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created

15

At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel

Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design

The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data

Fig 51

552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database

bull External Schemas

bull multiple multiple subschemas = multiple external views of the data bull

bull Conceptual Schema one only

bull Data items relationships amp constraints ndash ER diagram

bull Physical Schema one only

bull Record definitions representation methods

bull Access methods indexes hashing

Chapter 5 16

553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser

Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data

The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical

Source httpcnxorgcontentm28150latest

554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs

In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)

Source httpcnxorgcontentm28150latest

555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy

Source httpcnxorgcontentm28150latest

17

Chapter 6 Classification of DatabaseSystems

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The database management systems can be classified based on several criteria

61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine

62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently

63 Based on the ways database is distributed

631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

With centralized database systems the system is stored at a single site

Fig 61 Source httpcnxorgcontentm28150latest

Chapter 6 18

632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Actual database and DBMS software are distributed in various sites connected by acomputer network

Fig 62 Source httpcnxorgcontentm28150latest

633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily

634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Different sites might use different DBMS software There is additional software tosupport data exchange between sites

Source httpcnxorgcontentm28150latest

19

Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model

The relational model has provided the basis for

bull Research on theory of datarelationshipconstraint

bull Numerous database design methodologies

bull The standard database access language SQL

bull Almost all modern commercial database management systems

The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo

71 Fundamental concepts in Relational Data Model

711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute

Source httpcnxorgcontentm28250latest

712 TableAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables

Chapter 7 20

Fig 71

713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation

The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another

Look at the following example of an ID card to see the relationship between fields andtheir data

714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example

21

bull The domain of Marital status has a set of possibilities Married Single Divorced

bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip

bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000

bull The domain of First Name is the set of character strings that represents names ofpeople

In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter

715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records

A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases

Fig 72

The example above shows us how fields can hold a range of different sorts of data ndashthis one has

bull An ID field ordinal number integer

bull A Pubdate field daymonthyear date

bull An author field Initial Surname text

bull A title field text bull

bull A body text field text

By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates

Chapter 7 22

Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way

72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The degree is the number of attributes In our example above the degree is 4

73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Table has a name that is distinct from all other tables in the database

bull There are no duplicate rows each row is distinct

bull Entries in columns are atomic (no repeating groups or multivalued attributes)

bull Entries from columns are from the same domain based on their data type

number (numeric integer float smallinthellip)

character (string)

date

logical (true or false)

bull Operations combining different data types are disallowed

bull Each attribute has a distinct name

bull sequence of columns is insignificant

bull sequence of rows is insignificant

731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We have used many different terms Here is a summary Alternative 1 is the mostcommon

23

Chapter 7 24

Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The Entity Relationship Data Model (ER) has existed for over 35 years

The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations

ER modelling is based on two concepts

bull Entities that is things Eg Prof Ba Course Database System

bull Relationships that is associations or interactions between entities

Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams

The example database

For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model

This database contains the information about employees departments and projects

bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department

bull A department controls a number of projects each of which has unique name aunique number and the budget

bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects

bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee

81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An entity is an object in the real world with an independent existence and can bedifferentiated from other objects

An entity might be

bull An object with physical existence Eg a lecturer a student a car

25

bull An object with conceptual existence Eg a course a job a position

An Entity Type defines a collection of similar entities

An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box

Fig 81 Source httpcnxorgcontentm28250latest

811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics

bull they are the lsquobuilding blocksrsquo of a database

bull the primary key may be simple or composite

bull the primary key is not a foreign key

bull they do not depend on another entity for their existence

Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning

bull Dependent entities are used to connect two kernels together

bull They are said to be existent dependent on two or more tables

bull Many to many relationships become associative tables with at least two foreignkeys

bull They may contain other attributes

bull The foreign key identifies each associated table

bull There are three options for the primary key

bull use a composite of foreign keys of associated tables if unique

bull use a composite of foreign keys and qualifying column

bull create a new simple primary key

Characteristic entities provide more information about another table These entitieshave the following characteristic

Chapter 8 26

bull They represent multi-valued attributes

bull They describe other entities

bull They typically have a one to many relationship

bull The foreign key is used to further identify the characterized table

bull Options for primary key are as follows

bull foreign key plus a qualifying column

bull or create a new simple primary key

bull Employee(EID Name Address Age Salary)

bull EmployeePhone(EID Phone)

812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)

Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram

In the diagram each attribute is represented by an oval with a name inside

Fig 82 Source httpcnxorgcontentm28250latest

813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model

Simple attributes are attributes that are drawn from the atomic value domains

eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes

27

eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo

Fig 83 Source httpcnxorgcontentm28250latest

Multivalued attributes Attributes that have a set of values for each entity

eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo

Fig 84 Source httpcnxorgcontentm28250latest

Derived attributes Attributes contain values that are calculated from other attributes

eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute

Fig 85 Source httpcnxorgcontentm28250latest

Chapter 8 28

814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set

8141 Candidate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull a simple or composite key that is unique and minimal

bull unique ndash no two rows in a table may have the same value at any time

bull minimal ndash every column must be necessary for uniqueness

bull For example for the entity

Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)

Possible candidate keys are EID SIN

FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment

EID SIN are also candidate keys

8142 Composite key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Composed of more than one attribute

bull For example as discussed earlier

First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department

bull A composite key can have two or more attributes but it must be minimal

29

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 2: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

Contents

Chapter 1 Before the Advent of Database Systems111 File-based approach for banking system 112 Data Redundancy 113 Data Isolation 214 Integrity problems215 Security problems216 Concurrency ndash access anomalies217 Database Approach 318 Role of databases in Business 319 Meaning of Data 3

Chapter 2 Fundamental Concepts 5Chapter 3 Characteristics and Benefits of a Database 731 Self-Describing Nature of a Database System 732 Insulation between Program and Data 733 Support multiple views of data 734 Sharing of data and Multiuser system 735 Control Data Redundancy 836 Data Sharing 837 Enforcing Integrity Constraints 838 Restricting Unauthorised Access 839 Data Independence 8310 Transaction Processing 9311 Providing multiple views of data 9312 Providing backup and recovery facilities 9313 Managing information 9

Chapter 4 Types of Database Models 1141 High-level Conceptual Data models 11

411 Record-based Logical Data models11

Chapter 5 Data Modelling 1351 Degree of Data Abstraction 1352 External models 1453 Conceptual model 1454 Internal model 1455 Physical model 15

551 Data Abstraction Layer 15552 Schemas 16553 Logical and Physical Data Independence 17554 Logical data independence 17555 Physical data independence 17

Chapter 6 Classification of Database Systems 1861 Based on data model 1862 Based on the number of users 1863 Based on the ways database is distributed 18

631 Centralized Systems18

632 Distributed database system19633 Homogeneous distributed Database Systems 19634 Heterogeneous distributed Database Systems 19

Chapter 7 The Relational Data Model 2071 Fundamental concepts in Relational Data Model 20

711 Relation20712 Table 20713 Column 21714 Domain 21715 Records22

72 Degree 2373 Properties of Tables 23

731 Terminology 23

Chapter 8 Entity Relationship Model2581 Entity Entity Set and Entity Type 25

811 Types of Entities26812 Attributes27813 Types of Attributes 27814 Keys 29

8141 Candidate key 298142 Composite key 298143 Primary key 308144 Secondary key308145 Alternate key308146 Foreign key308147 Nulls 31

815 Entity Types 328151 Existence Dependency 328152 Weak Entities 328153 Strong Entities 32

816 Relationships 338161 1M relationship 338162 11 relationship338163 11 Example338164 MN relationships 348165 MN relationships 34

817 Unary Relationships (recursive)35818 Ternary Relationships 35819 Relationship Strength 36

Chapter 9 Integrity Rules and Constraints 3791 Domain Integrity 3792 Referential integrity Examples 3793 Referential Integrity in MS Access 3894 Referential Integrity using Transact SQL (MS SQL Server) 3895 Foreign Key Rules 39

951 Business Rules 39

96 Cardinality 40961 Relationship Participation 40

97 Optional 4198 Mandatory 41

981 Relationship Types 43

Chapter 10 ER Modelling 44101 Relational Design and Redundancy 44102 Insertion anomaly 45103 Update anomaly 45104 Deletion anomaly 45

Chapter 11 Functional Dependencies48111 Rules of Functional Dependencies 48112 Inference Rules 49

1121 Axiom of reflexivity 491122 Axiom of augmentation501123 Axiom of transitivity 50

113 Additional rules 501131 Union 501132 Decomposition 51

114 Dependency Diagram 51

Chapter 12 Normalization 52121 Normal Forms52

1211 School database 5312111 Process for 1NF 5312112 Update anomalies in 1st normal form 53

122 2nd Normal Form531221 Process for 2NF 541222 Update anomalies in 2nd normal form54

123 3rd Normal Form541231 Remove Transitive Dependencies54

12311 Update anomalies in 3rd normal form 55

1232 Boyce-Codd Normal Form (BCNF)561233 BCNF Example 2 58

12331 Normalization and Database Design 59

Chapter 13 Database Development Process 60131 SDLC ndash Waterfall 60132 Database Life Cycle 61133 Requirements gathering 62134 Analysis 63135 Logical Design 63136 Implementation 65137 Realising the design 66138 Populating the database 66139 Guidelines For Developing An ER Diagram 67

Chapter 14 Database Users 68141 End users 68142 Application Programmers 68143 Database Administrators 68

Chapter 1 Before the Advent ofDatabase Systems

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One way to keep information on a computer is to store it in permanent files Acompany system has a number of application programs each of them is defined tomanipulate data files These application programs have been written on request ofthe users in the organization New application will be added to the system as the needarises The system just described is called the file-based system

Consider a traditional banking system which uses the file-based system to manage theorganizationrsquos data in the picture below As we can see there are differentdepartments in the Bank each of them has their own applications that manage andmanipulate different data files For banking systems the programs can be the ones todebit or credit an account find the balance of an account add a new mortgage loanor generate monthly statements etc

Fig 11

httpcnxorg contentm28150latest (httpcnxorg20contentm28150latest)

11 File-based approach for banking systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Keeping organizational information in this approach has a number of disadvantagesincluding

12 Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Since files and applications are created by different programmers of variousdepartments over long periods of time it might lead to several problems such as

1

bull inconsistency in data format

bull the same information may be kept in several different place (files)

bull data inconsistency which means various copies of the same data are conflicting this wastes stor age space and duplicates effort

13 Data IsolationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull It is difficult for new applications to retrieve the appropriate data which might bestored in various files

14 Integrity problemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Data values must satisfy certain consistency constraints that are specified in theapplication programsbull

bull It is difficult to make changes to the programs to enforce new constraints

15 Security problemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull There are constraints regarding accessing privileges

bull Application requirements are added to the system in an ad-hoc manner so it isdifficult to enforce constraints

16 Concurrency ndash access anomaliesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Data may be accessed by many applications that have not been coordinatedpreviously so it is not easy to provide a strategy to support multiple users toupdate data simultaneously

These difficulties have prompted the development of a new approach in managinglarge amount of organizational information ndash the database approach

Chapter 1 2

17 Database ApproachAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Databases and database technology play an important role in most of areas wherecomputers are used including business education medicine etc To understand thefundamentals of database systems we start by introducing the basic concepts in thisarea

18 Role of databases in BusinessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Everybody uses a database in some way maybe just to store information about theirfriends and family That data might be written down or stored in a computer using aword processing program or it could even be saved in a spreadsheet However thebest place for it would be the Database Management Software (DBMS) The DBMS is apower software tool that allows you to store manipulate and retrieve data in a varietyof different ways

Most companies keep track of customer information by storing it in a database Thatdata may be customers employees products orders or anything that may assist thebusiness in their operations

19 Meaning of DataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data is factual information such as measurements or statistics about objects andconcepts We use it for discussion or calculation It could be a person a place anevent an action any one of a number of things A single fact is an element of data ora data element

If data is information and information is what we are all in the business of workingwith you can start to see where you may be storing it

bull Filing cabinets

bull Spreadsheets

bull Folders

bull Ledgers Lists

bull Piles of papers on your desk

They all store information and so too does a database Because of the mechanicalnature of databases they have terrific power to manage and process the information

3

they hold This can make the information they house much more useful for your workWith this understanding of data we can start to see how a tool with the capacity tostore a collection of data and organize it especially for rapid search retrieval andprocessing might make a difference to how we can use data This is all aboutmanaging information Source http cnxorgcontentm28150latest (httphttp20cnxorgcontentm28150latest)

Chapter 1 4

Chapter 2 Fundamental ConceptsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A Database is a shared collection of related data which is used to support the activitiesof a particular organization A database can be viewed as a repository of data that isdefined once and then is accessed by various users A database has the followingproperties

bull It is a representation of some aspect of the real world or perhaps a collection ofdata elements (facts) representing real-world information

bull A database is logical coherent and internally consistent

bull A database is designed built and populated with data for a specific purpose

bull Each data item is stored in a field

bull A combination of fields makes up a table For example an employee tablecontains fields that contain data about an individual employee

A Database Management System (DBMS) is a collection of programs that enable usersto create maintain databases and control all the access to the databases The primarygoal of the DBMS is to provide an environment that is both convenient and efficientfor users to retrieve and store information

Fig 21

httpscnxorgcontentm28150latest With the database approach we can have thetraditional banking system as shown in the following image

5

Fig 22

httpscnxorgcontentm28150latest

Chapter 2 6

Chapter 3 Characteristics andBenefits of a Database

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a number of characteristics that distinguish the database approach with thefile-based approach

31 Self-Describing Nature of a Database SystemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A Database System contains not only the database itself but also the descriptions ofdata structure and constraints (meta-data) This information is used by the DBMSsoftware or database users if needed This separation makes a database systemtotally different from the traditional file-based system in which the data definition is apart of application programs

32 Insulation between Program and DataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the file based system the structure of the data files is defined in the applicationprograms so if a user wants to change the structure of a file all the programs thataccess that file might need to be changed as well On the other hand in the databaseapproach the data structure is stored in the system catalog not in the programsTherefore one change is all thatrsquos needed

33 Support multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A view is a subset of the database which is defined and dedicated for particular usersof the system Multiple users in the system might have different views of the systemEach view might contain only the data of interest to a user or a group of users

34 Sharing of data and Multiuser systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A multiuser database system must allow multiple users access to the database at thesame time As a result the multiuser DBMS must have concurrency control strategies

7

to ensure several users access to the same data item at the same time and to do so ina manner that the data will always be correct ndash data integrity

35 Control Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the Database approach ideally each data item is stored in only one place in thedatabase In some cases redundancy still exists so as to improve system performancebut such redundancy is controlled and kept to minimum

36 Data SharingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The integration of the whole data in an organization leads to the ability to producemore information from a given amount of data

37 Enforcing Integrity ConstraintsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

DBMSs should provide capabilities to define and enforce certain constraints such asdata type data uniqueness etc

38 Restricting Unauthorised AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Not all users of the system have the same accessing privileges DBMSs should providea security subsystem to create and control the user accounts

39 Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

System data (Meta Data) descriptions are separated from the application programsChanges to the data structure is handled by the DBMS and not embedded in theprogram

Chapter 3 8

310 Transaction ProcessingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The DBMS must include concurrency control subsystems to ensure that several userstrying to update the same data do so in a controlled manner The results of anyupdates to the database must maintain consistency and validity

311 Providing multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A view may be a subset of the database Various users may have different views of thedatabase itself Users may not need to be aware of how and where the data they referto is stored

312 Providing backup and recovery facilitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If the computer system fails in the middle of a complex update process the recoverysubsystem is responsible for making sure that the database is restored to the stage itwas in before the process started executing

313 Managing informationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Managing information means taking care of it so that it works for us and is useful forthe work we are doing The information we collect is no longer subject to ldquoaccidentaldisorganizationrdquo and becomes more easily accessible and integrated with the rest ofour work Managing information using a database allows us to become strategic usersof the data we have

We often need to access and re-sort data for various uses These may include

bull Creating mailing lists

bull Writing management reports

bull Generating lists of selected news stories

bull Identifying various client needs

The processing power of a database allows it to manipulate the data it houses so itcan

9

bull Sortbull Matchbull Linkbull Aggregatebull Skip fieldsbull Calculatebull Arrange

So a database tracks data finding the bits you need and processing them in the wayyou arrange for them to be processed

bull Because of the versatility of databases we find them powering all sorts ofprojects A database can be linked to

bull A web site that is capturing registered users

bull A client tracking application for social service organisations

bull A medical record system for a health care facility

bull Your personal address book in your e-mail client

bull A collection of word processed documents

bull A system that issues airline reservations

Source httpcnxorgcontentm28150latest (httpcnxorgcontentm28150latest)

Chapter 3 10

Chapter 4 Types of Database Models

41 High-level Conceptual Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Provide concepts that are close to the way people perceive data to present the data Atypical example of this type is the entity relationship model which uses main conceptslike entities attributes relationships An entity represents a real-world object such asan employee a project An entity has some attributes which represents properties ofentity such as employeersquos name address birthdate A relationship represents anassociation among entities for example an employee works on many projects Therelationship exists between employee and project

411 Record-based Logical Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Provide concepts that can be understood by the user but not too far from the waydata is stored in the computer Three well-known data models of this type arerelational data model network data model and hierarchical data model

bull The Relational model represents data as relations or tables

bull The Network model represents data as record types and also represents a limitedtype of one to many relationship called set type

Fig 41 httpcnxorgcontentm28150latest

The Hierarchical model represents data as a hierarchical tree structures Eachhierarchy represents a number of related records Here is the schema in hierarchicalmodel notation

11

Fig 42 httpcnxorgcontentm28150latest

Chapter 4 12

Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to

bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)

bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )

bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)

The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database

51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction

13

The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS

52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Is the userrsquos view of the database

bull contains multiple different external views

bull is closely related to the real world as perceived by each user

53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull These models provide a flexible data structuring capabilities

bull Community view the logical structure of the entire database

bull Contains lsquoWhatrsquo data is stored bull

bull Shows relationships among data

bull constraints

bull semantic information (business rules ndash discussed later)

bull security amp integrity information

bull A database is considered as a collection of entities (objects) of variouskinds

bull Basis for identification and high-level description of main data objects avoidsdetails

bull They are database independent It does not matter what database you will beusing

54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Database is considered as a collection of fixed ndash size record

bull This model is closer to the physical level or file structure

bull Is a representation of database as seen by the DBMS

Chapter 5 14

bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model

bull The entities in the conceptual model are mapped to the tables in the relationalmodel

bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model

55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Physical representation of the database

bull Lowest level of abstractions bull

bull It is how the data is stored dealing with

bull Run-time performance

bull Storage utilization amp compression

bull File organization amp access methods

bull Data encryption

bull Physical level ndash managed by OS

bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory

551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model

The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother

The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)

As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created

15

At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel

Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design

The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data

Fig 51

552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database

bull External Schemas

bull multiple multiple subschemas = multiple external views of the data bull

bull Conceptual Schema one only

bull Data items relationships amp constraints ndash ER diagram

bull Physical Schema one only

bull Record definitions representation methods

bull Access methods indexes hashing

Chapter 5 16

553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser

Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data

The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical

Source httpcnxorgcontentm28150latest

554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs

In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)

Source httpcnxorgcontentm28150latest

555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy

Source httpcnxorgcontentm28150latest

17

Chapter 6 Classification of DatabaseSystems

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The database management systems can be classified based on several criteria

61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine

62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently

63 Based on the ways database is distributed

631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

With centralized database systems the system is stored at a single site

Fig 61 Source httpcnxorgcontentm28150latest

Chapter 6 18

632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Actual database and DBMS software are distributed in various sites connected by acomputer network

Fig 62 Source httpcnxorgcontentm28150latest

633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily

634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Different sites might use different DBMS software There is additional software tosupport data exchange between sites

Source httpcnxorgcontentm28150latest

19

Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model

The relational model has provided the basis for

bull Research on theory of datarelationshipconstraint

bull Numerous database design methodologies

bull The standard database access language SQL

bull Almost all modern commercial database management systems

The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo

71 Fundamental concepts in Relational Data Model

711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute

Source httpcnxorgcontentm28250latest

712 TableAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables

Chapter 7 20

Fig 71

713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation

The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another

Look at the following example of an ID card to see the relationship between fields andtheir data

714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example

21

bull The domain of Marital status has a set of possibilities Married Single Divorced

bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip

bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000

bull The domain of First Name is the set of character strings that represents names ofpeople

In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter

715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records

A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases

Fig 72

The example above shows us how fields can hold a range of different sorts of data ndashthis one has

bull An ID field ordinal number integer

bull A Pubdate field daymonthyear date

bull An author field Initial Surname text

bull A title field text bull

bull A body text field text

By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates

Chapter 7 22

Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way

72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The degree is the number of attributes In our example above the degree is 4

73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Table has a name that is distinct from all other tables in the database

bull There are no duplicate rows each row is distinct

bull Entries in columns are atomic (no repeating groups or multivalued attributes)

bull Entries from columns are from the same domain based on their data type

number (numeric integer float smallinthellip)

character (string)

date

logical (true or false)

bull Operations combining different data types are disallowed

bull Each attribute has a distinct name

bull sequence of columns is insignificant

bull sequence of rows is insignificant

731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We have used many different terms Here is a summary Alternative 1 is the mostcommon

23

Chapter 7 24

Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The Entity Relationship Data Model (ER) has existed for over 35 years

The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations

ER modelling is based on two concepts

bull Entities that is things Eg Prof Ba Course Database System

bull Relationships that is associations or interactions between entities

Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams

The example database

For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model

This database contains the information about employees departments and projects

bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department

bull A department controls a number of projects each of which has unique name aunique number and the budget

bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects

bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee

81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An entity is an object in the real world with an independent existence and can bedifferentiated from other objects

An entity might be

bull An object with physical existence Eg a lecturer a student a car

25

bull An object with conceptual existence Eg a course a job a position

An Entity Type defines a collection of similar entities

An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box

Fig 81 Source httpcnxorgcontentm28250latest

811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics

bull they are the lsquobuilding blocksrsquo of a database

bull the primary key may be simple or composite

bull the primary key is not a foreign key

bull they do not depend on another entity for their existence

Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning

bull Dependent entities are used to connect two kernels together

bull They are said to be existent dependent on two or more tables

bull Many to many relationships become associative tables with at least two foreignkeys

bull They may contain other attributes

bull The foreign key identifies each associated table

bull There are three options for the primary key

bull use a composite of foreign keys of associated tables if unique

bull use a composite of foreign keys and qualifying column

bull create a new simple primary key

Characteristic entities provide more information about another table These entitieshave the following characteristic

Chapter 8 26

bull They represent multi-valued attributes

bull They describe other entities

bull They typically have a one to many relationship

bull The foreign key is used to further identify the characterized table

bull Options for primary key are as follows

bull foreign key plus a qualifying column

bull or create a new simple primary key

bull Employee(EID Name Address Age Salary)

bull EmployeePhone(EID Phone)

812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)

Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram

In the diagram each attribute is represented by an oval with a name inside

Fig 82 Source httpcnxorgcontentm28250latest

813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model

Simple attributes are attributes that are drawn from the atomic value domains

eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes

27

eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo

Fig 83 Source httpcnxorgcontentm28250latest

Multivalued attributes Attributes that have a set of values for each entity

eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo

Fig 84 Source httpcnxorgcontentm28250latest

Derived attributes Attributes contain values that are calculated from other attributes

eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute

Fig 85 Source httpcnxorgcontentm28250latest

Chapter 8 28

814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set

8141 Candidate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull a simple or composite key that is unique and minimal

bull unique ndash no two rows in a table may have the same value at any time

bull minimal ndash every column must be necessary for uniqueness

bull For example for the entity

Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)

Possible candidate keys are EID SIN

FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment

EID SIN are also candidate keys

8142 Composite key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Composed of more than one attribute

bull For example as discussed earlier

First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department

bull A composite key can have two or more attributes but it must be minimal

29

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 3: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

632 Distributed database system19633 Homogeneous distributed Database Systems 19634 Heterogeneous distributed Database Systems 19

Chapter 7 The Relational Data Model 2071 Fundamental concepts in Relational Data Model 20

711 Relation20712 Table 20713 Column 21714 Domain 21715 Records22

72 Degree 2373 Properties of Tables 23

731 Terminology 23

Chapter 8 Entity Relationship Model2581 Entity Entity Set and Entity Type 25

811 Types of Entities26812 Attributes27813 Types of Attributes 27814 Keys 29

8141 Candidate key 298142 Composite key 298143 Primary key 308144 Secondary key308145 Alternate key308146 Foreign key308147 Nulls 31

815 Entity Types 328151 Existence Dependency 328152 Weak Entities 328153 Strong Entities 32

816 Relationships 338161 1M relationship 338162 11 relationship338163 11 Example338164 MN relationships 348165 MN relationships 34

817 Unary Relationships (recursive)35818 Ternary Relationships 35819 Relationship Strength 36

Chapter 9 Integrity Rules and Constraints 3791 Domain Integrity 3792 Referential integrity Examples 3793 Referential Integrity in MS Access 3894 Referential Integrity using Transact SQL (MS SQL Server) 3895 Foreign Key Rules 39

951 Business Rules 39

96 Cardinality 40961 Relationship Participation 40

97 Optional 4198 Mandatory 41

981 Relationship Types 43

Chapter 10 ER Modelling 44101 Relational Design and Redundancy 44102 Insertion anomaly 45103 Update anomaly 45104 Deletion anomaly 45

Chapter 11 Functional Dependencies48111 Rules of Functional Dependencies 48112 Inference Rules 49

1121 Axiom of reflexivity 491122 Axiom of augmentation501123 Axiom of transitivity 50

113 Additional rules 501131 Union 501132 Decomposition 51

114 Dependency Diagram 51

Chapter 12 Normalization 52121 Normal Forms52

1211 School database 5312111 Process for 1NF 5312112 Update anomalies in 1st normal form 53

122 2nd Normal Form531221 Process for 2NF 541222 Update anomalies in 2nd normal form54

123 3rd Normal Form541231 Remove Transitive Dependencies54

12311 Update anomalies in 3rd normal form 55

1232 Boyce-Codd Normal Form (BCNF)561233 BCNF Example 2 58

12331 Normalization and Database Design 59

Chapter 13 Database Development Process 60131 SDLC ndash Waterfall 60132 Database Life Cycle 61133 Requirements gathering 62134 Analysis 63135 Logical Design 63136 Implementation 65137 Realising the design 66138 Populating the database 66139 Guidelines For Developing An ER Diagram 67

Chapter 14 Database Users 68141 End users 68142 Application Programmers 68143 Database Administrators 68

Chapter 1 Before the Advent ofDatabase Systems

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One way to keep information on a computer is to store it in permanent files Acompany system has a number of application programs each of them is defined tomanipulate data files These application programs have been written on request ofthe users in the organization New application will be added to the system as the needarises The system just described is called the file-based system

Consider a traditional banking system which uses the file-based system to manage theorganizationrsquos data in the picture below As we can see there are differentdepartments in the Bank each of them has their own applications that manage andmanipulate different data files For banking systems the programs can be the ones todebit or credit an account find the balance of an account add a new mortgage loanor generate monthly statements etc

Fig 11

httpcnxorg contentm28150latest (httpcnxorg20contentm28150latest)

11 File-based approach for banking systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Keeping organizational information in this approach has a number of disadvantagesincluding

12 Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Since files and applications are created by different programmers of variousdepartments over long periods of time it might lead to several problems such as

1

bull inconsistency in data format

bull the same information may be kept in several different place (files)

bull data inconsistency which means various copies of the same data are conflicting this wastes stor age space and duplicates effort

13 Data IsolationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull It is difficult for new applications to retrieve the appropriate data which might bestored in various files

14 Integrity problemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Data values must satisfy certain consistency constraints that are specified in theapplication programsbull

bull It is difficult to make changes to the programs to enforce new constraints

15 Security problemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull There are constraints regarding accessing privileges

bull Application requirements are added to the system in an ad-hoc manner so it isdifficult to enforce constraints

16 Concurrency ndash access anomaliesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Data may be accessed by many applications that have not been coordinatedpreviously so it is not easy to provide a strategy to support multiple users toupdate data simultaneously

These difficulties have prompted the development of a new approach in managinglarge amount of organizational information ndash the database approach

Chapter 1 2

17 Database ApproachAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Databases and database technology play an important role in most of areas wherecomputers are used including business education medicine etc To understand thefundamentals of database systems we start by introducing the basic concepts in thisarea

18 Role of databases in BusinessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Everybody uses a database in some way maybe just to store information about theirfriends and family That data might be written down or stored in a computer using aword processing program or it could even be saved in a spreadsheet However thebest place for it would be the Database Management Software (DBMS) The DBMS is apower software tool that allows you to store manipulate and retrieve data in a varietyof different ways

Most companies keep track of customer information by storing it in a database Thatdata may be customers employees products orders or anything that may assist thebusiness in their operations

19 Meaning of DataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data is factual information such as measurements or statistics about objects andconcepts We use it for discussion or calculation It could be a person a place anevent an action any one of a number of things A single fact is an element of data ora data element

If data is information and information is what we are all in the business of workingwith you can start to see where you may be storing it

bull Filing cabinets

bull Spreadsheets

bull Folders

bull Ledgers Lists

bull Piles of papers on your desk

They all store information and so too does a database Because of the mechanicalnature of databases they have terrific power to manage and process the information

3

they hold This can make the information they house much more useful for your workWith this understanding of data we can start to see how a tool with the capacity tostore a collection of data and organize it especially for rapid search retrieval andprocessing might make a difference to how we can use data This is all aboutmanaging information Source http cnxorgcontentm28150latest (httphttp20cnxorgcontentm28150latest)

Chapter 1 4

Chapter 2 Fundamental ConceptsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A Database is a shared collection of related data which is used to support the activitiesof a particular organization A database can be viewed as a repository of data that isdefined once and then is accessed by various users A database has the followingproperties

bull It is a representation of some aspect of the real world or perhaps a collection ofdata elements (facts) representing real-world information

bull A database is logical coherent and internally consistent

bull A database is designed built and populated with data for a specific purpose

bull Each data item is stored in a field

bull A combination of fields makes up a table For example an employee tablecontains fields that contain data about an individual employee

A Database Management System (DBMS) is a collection of programs that enable usersto create maintain databases and control all the access to the databases The primarygoal of the DBMS is to provide an environment that is both convenient and efficientfor users to retrieve and store information

Fig 21

httpscnxorgcontentm28150latest With the database approach we can have thetraditional banking system as shown in the following image

5

Fig 22

httpscnxorgcontentm28150latest

Chapter 2 6

Chapter 3 Characteristics andBenefits of a Database

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a number of characteristics that distinguish the database approach with thefile-based approach

31 Self-Describing Nature of a Database SystemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A Database System contains not only the database itself but also the descriptions ofdata structure and constraints (meta-data) This information is used by the DBMSsoftware or database users if needed This separation makes a database systemtotally different from the traditional file-based system in which the data definition is apart of application programs

32 Insulation between Program and DataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the file based system the structure of the data files is defined in the applicationprograms so if a user wants to change the structure of a file all the programs thataccess that file might need to be changed as well On the other hand in the databaseapproach the data structure is stored in the system catalog not in the programsTherefore one change is all thatrsquos needed

33 Support multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A view is a subset of the database which is defined and dedicated for particular usersof the system Multiple users in the system might have different views of the systemEach view might contain only the data of interest to a user or a group of users

34 Sharing of data and Multiuser systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A multiuser database system must allow multiple users access to the database at thesame time As a result the multiuser DBMS must have concurrency control strategies

7

to ensure several users access to the same data item at the same time and to do so ina manner that the data will always be correct ndash data integrity

35 Control Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the Database approach ideally each data item is stored in only one place in thedatabase In some cases redundancy still exists so as to improve system performancebut such redundancy is controlled and kept to minimum

36 Data SharingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The integration of the whole data in an organization leads to the ability to producemore information from a given amount of data

37 Enforcing Integrity ConstraintsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

DBMSs should provide capabilities to define and enforce certain constraints such asdata type data uniqueness etc

38 Restricting Unauthorised AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Not all users of the system have the same accessing privileges DBMSs should providea security subsystem to create and control the user accounts

39 Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

System data (Meta Data) descriptions are separated from the application programsChanges to the data structure is handled by the DBMS and not embedded in theprogram

Chapter 3 8

310 Transaction ProcessingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The DBMS must include concurrency control subsystems to ensure that several userstrying to update the same data do so in a controlled manner The results of anyupdates to the database must maintain consistency and validity

311 Providing multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A view may be a subset of the database Various users may have different views of thedatabase itself Users may not need to be aware of how and where the data they referto is stored

312 Providing backup and recovery facilitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If the computer system fails in the middle of a complex update process the recoverysubsystem is responsible for making sure that the database is restored to the stage itwas in before the process started executing

313 Managing informationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Managing information means taking care of it so that it works for us and is useful forthe work we are doing The information we collect is no longer subject to ldquoaccidentaldisorganizationrdquo and becomes more easily accessible and integrated with the rest ofour work Managing information using a database allows us to become strategic usersof the data we have

We often need to access and re-sort data for various uses These may include

bull Creating mailing lists

bull Writing management reports

bull Generating lists of selected news stories

bull Identifying various client needs

The processing power of a database allows it to manipulate the data it houses so itcan

9

bull Sortbull Matchbull Linkbull Aggregatebull Skip fieldsbull Calculatebull Arrange

So a database tracks data finding the bits you need and processing them in the wayyou arrange for them to be processed

bull Because of the versatility of databases we find them powering all sorts ofprojects A database can be linked to

bull A web site that is capturing registered users

bull A client tracking application for social service organisations

bull A medical record system for a health care facility

bull Your personal address book in your e-mail client

bull A collection of word processed documents

bull A system that issues airline reservations

Source httpcnxorgcontentm28150latest (httpcnxorgcontentm28150latest)

Chapter 3 10

Chapter 4 Types of Database Models

41 High-level Conceptual Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Provide concepts that are close to the way people perceive data to present the data Atypical example of this type is the entity relationship model which uses main conceptslike entities attributes relationships An entity represents a real-world object such asan employee a project An entity has some attributes which represents properties ofentity such as employeersquos name address birthdate A relationship represents anassociation among entities for example an employee works on many projects Therelationship exists between employee and project

411 Record-based Logical Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Provide concepts that can be understood by the user but not too far from the waydata is stored in the computer Three well-known data models of this type arerelational data model network data model and hierarchical data model

bull The Relational model represents data as relations or tables

bull The Network model represents data as record types and also represents a limitedtype of one to many relationship called set type

Fig 41 httpcnxorgcontentm28150latest

The Hierarchical model represents data as a hierarchical tree structures Eachhierarchy represents a number of related records Here is the schema in hierarchicalmodel notation

11

Fig 42 httpcnxorgcontentm28150latest

Chapter 4 12

Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to

bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)

bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )

bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)

The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database

51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction

13

The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS

52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Is the userrsquos view of the database

bull contains multiple different external views

bull is closely related to the real world as perceived by each user

53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull These models provide a flexible data structuring capabilities

bull Community view the logical structure of the entire database

bull Contains lsquoWhatrsquo data is stored bull

bull Shows relationships among data

bull constraints

bull semantic information (business rules ndash discussed later)

bull security amp integrity information

bull A database is considered as a collection of entities (objects) of variouskinds

bull Basis for identification and high-level description of main data objects avoidsdetails

bull They are database independent It does not matter what database you will beusing

54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Database is considered as a collection of fixed ndash size record

bull This model is closer to the physical level or file structure

bull Is a representation of database as seen by the DBMS

Chapter 5 14

bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model

bull The entities in the conceptual model are mapped to the tables in the relationalmodel

bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model

55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Physical representation of the database

bull Lowest level of abstractions bull

bull It is how the data is stored dealing with

bull Run-time performance

bull Storage utilization amp compression

bull File organization amp access methods

bull Data encryption

bull Physical level ndash managed by OS

bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory

551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model

The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother

The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)

As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created

15

At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel

Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design

The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data

Fig 51

552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database

bull External Schemas

bull multiple multiple subschemas = multiple external views of the data bull

bull Conceptual Schema one only

bull Data items relationships amp constraints ndash ER diagram

bull Physical Schema one only

bull Record definitions representation methods

bull Access methods indexes hashing

Chapter 5 16

553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser

Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data

The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical

Source httpcnxorgcontentm28150latest

554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs

In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)

Source httpcnxorgcontentm28150latest

555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy

Source httpcnxorgcontentm28150latest

17

Chapter 6 Classification of DatabaseSystems

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The database management systems can be classified based on several criteria

61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine

62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently

63 Based on the ways database is distributed

631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

With centralized database systems the system is stored at a single site

Fig 61 Source httpcnxorgcontentm28150latest

Chapter 6 18

632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Actual database and DBMS software are distributed in various sites connected by acomputer network

Fig 62 Source httpcnxorgcontentm28150latest

633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily

634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Different sites might use different DBMS software There is additional software tosupport data exchange between sites

Source httpcnxorgcontentm28150latest

19

Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model

The relational model has provided the basis for

bull Research on theory of datarelationshipconstraint

bull Numerous database design methodologies

bull The standard database access language SQL

bull Almost all modern commercial database management systems

The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo

71 Fundamental concepts in Relational Data Model

711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute

Source httpcnxorgcontentm28250latest

712 TableAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables

Chapter 7 20

Fig 71

713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation

The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another

Look at the following example of an ID card to see the relationship between fields andtheir data

714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example

21

bull The domain of Marital status has a set of possibilities Married Single Divorced

bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip

bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000

bull The domain of First Name is the set of character strings that represents names ofpeople

In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter

715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records

A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases

Fig 72

The example above shows us how fields can hold a range of different sorts of data ndashthis one has

bull An ID field ordinal number integer

bull A Pubdate field daymonthyear date

bull An author field Initial Surname text

bull A title field text bull

bull A body text field text

By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates

Chapter 7 22

Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way

72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The degree is the number of attributes In our example above the degree is 4

73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Table has a name that is distinct from all other tables in the database

bull There are no duplicate rows each row is distinct

bull Entries in columns are atomic (no repeating groups or multivalued attributes)

bull Entries from columns are from the same domain based on their data type

number (numeric integer float smallinthellip)

character (string)

date

logical (true or false)

bull Operations combining different data types are disallowed

bull Each attribute has a distinct name

bull sequence of columns is insignificant

bull sequence of rows is insignificant

731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We have used many different terms Here is a summary Alternative 1 is the mostcommon

23

Chapter 7 24

Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The Entity Relationship Data Model (ER) has existed for over 35 years

The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations

ER modelling is based on two concepts

bull Entities that is things Eg Prof Ba Course Database System

bull Relationships that is associations or interactions between entities

Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams

The example database

For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model

This database contains the information about employees departments and projects

bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department

bull A department controls a number of projects each of which has unique name aunique number and the budget

bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects

bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee

81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An entity is an object in the real world with an independent existence and can bedifferentiated from other objects

An entity might be

bull An object with physical existence Eg a lecturer a student a car

25

bull An object with conceptual existence Eg a course a job a position

An Entity Type defines a collection of similar entities

An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box

Fig 81 Source httpcnxorgcontentm28250latest

811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics

bull they are the lsquobuilding blocksrsquo of a database

bull the primary key may be simple or composite

bull the primary key is not a foreign key

bull they do not depend on another entity for their existence

Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning

bull Dependent entities are used to connect two kernels together

bull They are said to be existent dependent on two or more tables

bull Many to many relationships become associative tables with at least two foreignkeys

bull They may contain other attributes

bull The foreign key identifies each associated table

bull There are three options for the primary key

bull use a composite of foreign keys of associated tables if unique

bull use a composite of foreign keys and qualifying column

bull create a new simple primary key

Characteristic entities provide more information about another table These entitieshave the following characteristic

Chapter 8 26

bull They represent multi-valued attributes

bull They describe other entities

bull They typically have a one to many relationship

bull The foreign key is used to further identify the characterized table

bull Options for primary key are as follows

bull foreign key plus a qualifying column

bull or create a new simple primary key

bull Employee(EID Name Address Age Salary)

bull EmployeePhone(EID Phone)

812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)

Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram

In the diagram each attribute is represented by an oval with a name inside

Fig 82 Source httpcnxorgcontentm28250latest

813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model

Simple attributes are attributes that are drawn from the atomic value domains

eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes

27

eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo

Fig 83 Source httpcnxorgcontentm28250latest

Multivalued attributes Attributes that have a set of values for each entity

eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo

Fig 84 Source httpcnxorgcontentm28250latest

Derived attributes Attributes contain values that are calculated from other attributes

eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute

Fig 85 Source httpcnxorgcontentm28250latest

Chapter 8 28

814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set

8141 Candidate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull a simple or composite key that is unique and minimal

bull unique ndash no two rows in a table may have the same value at any time

bull minimal ndash every column must be necessary for uniqueness

bull For example for the entity

Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)

Possible candidate keys are EID SIN

FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment

EID SIN are also candidate keys

8142 Composite key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Composed of more than one attribute

bull For example as discussed earlier

First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department

bull A composite key can have two or more attributes but it must be minimal

29

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 4: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

96 Cardinality 40961 Relationship Participation 40

97 Optional 4198 Mandatory 41

981 Relationship Types 43

Chapter 10 ER Modelling 44101 Relational Design and Redundancy 44102 Insertion anomaly 45103 Update anomaly 45104 Deletion anomaly 45

Chapter 11 Functional Dependencies48111 Rules of Functional Dependencies 48112 Inference Rules 49

1121 Axiom of reflexivity 491122 Axiom of augmentation501123 Axiom of transitivity 50

113 Additional rules 501131 Union 501132 Decomposition 51

114 Dependency Diagram 51

Chapter 12 Normalization 52121 Normal Forms52

1211 School database 5312111 Process for 1NF 5312112 Update anomalies in 1st normal form 53

122 2nd Normal Form531221 Process for 2NF 541222 Update anomalies in 2nd normal form54

123 3rd Normal Form541231 Remove Transitive Dependencies54

12311 Update anomalies in 3rd normal form 55

1232 Boyce-Codd Normal Form (BCNF)561233 BCNF Example 2 58

12331 Normalization and Database Design 59

Chapter 13 Database Development Process 60131 SDLC ndash Waterfall 60132 Database Life Cycle 61133 Requirements gathering 62134 Analysis 63135 Logical Design 63136 Implementation 65137 Realising the design 66138 Populating the database 66139 Guidelines For Developing An ER Diagram 67

Chapter 14 Database Users 68141 End users 68142 Application Programmers 68143 Database Administrators 68

Chapter 1 Before the Advent ofDatabase Systems

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One way to keep information on a computer is to store it in permanent files Acompany system has a number of application programs each of them is defined tomanipulate data files These application programs have been written on request ofthe users in the organization New application will be added to the system as the needarises The system just described is called the file-based system

Consider a traditional banking system which uses the file-based system to manage theorganizationrsquos data in the picture below As we can see there are differentdepartments in the Bank each of them has their own applications that manage andmanipulate different data files For banking systems the programs can be the ones todebit or credit an account find the balance of an account add a new mortgage loanor generate monthly statements etc

Fig 11

httpcnxorg contentm28150latest (httpcnxorg20contentm28150latest)

11 File-based approach for banking systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Keeping organizational information in this approach has a number of disadvantagesincluding

12 Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Since files and applications are created by different programmers of variousdepartments over long periods of time it might lead to several problems such as

1

bull inconsistency in data format

bull the same information may be kept in several different place (files)

bull data inconsistency which means various copies of the same data are conflicting this wastes stor age space and duplicates effort

13 Data IsolationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull It is difficult for new applications to retrieve the appropriate data which might bestored in various files

14 Integrity problemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Data values must satisfy certain consistency constraints that are specified in theapplication programsbull

bull It is difficult to make changes to the programs to enforce new constraints

15 Security problemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull There are constraints regarding accessing privileges

bull Application requirements are added to the system in an ad-hoc manner so it isdifficult to enforce constraints

16 Concurrency ndash access anomaliesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Data may be accessed by many applications that have not been coordinatedpreviously so it is not easy to provide a strategy to support multiple users toupdate data simultaneously

These difficulties have prompted the development of a new approach in managinglarge amount of organizational information ndash the database approach

Chapter 1 2

17 Database ApproachAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Databases and database technology play an important role in most of areas wherecomputers are used including business education medicine etc To understand thefundamentals of database systems we start by introducing the basic concepts in thisarea

18 Role of databases in BusinessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Everybody uses a database in some way maybe just to store information about theirfriends and family That data might be written down or stored in a computer using aword processing program or it could even be saved in a spreadsheet However thebest place for it would be the Database Management Software (DBMS) The DBMS is apower software tool that allows you to store manipulate and retrieve data in a varietyof different ways

Most companies keep track of customer information by storing it in a database Thatdata may be customers employees products orders or anything that may assist thebusiness in their operations

19 Meaning of DataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data is factual information such as measurements or statistics about objects andconcepts We use it for discussion or calculation It could be a person a place anevent an action any one of a number of things A single fact is an element of data ora data element

If data is information and information is what we are all in the business of workingwith you can start to see where you may be storing it

bull Filing cabinets

bull Spreadsheets

bull Folders

bull Ledgers Lists

bull Piles of papers on your desk

They all store information and so too does a database Because of the mechanicalnature of databases they have terrific power to manage and process the information

3

they hold This can make the information they house much more useful for your workWith this understanding of data we can start to see how a tool with the capacity tostore a collection of data and organize it especially for rapid search retrieval andprocessing might make a difference to how we can use data This is all aboutmanaging information Source http cnxorgcontentm28150latest (httphttp20cnxorgcontentm28150latest)

Chapter 1 4

Chapter 2 Fundamental ConceptsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A Database is a shared collection of related data which is used to support the activitiesof a particular organization A database can be viewed as a repository of data that isdefined once and then is accessed by various users A database has the followingproperties

bull It is a representation of some aspect of the real world or perhaps a collection ofdata elements (facts) representing real-world information

bull A database is logical coherent and internally consistent

bull A database is designed built and populated with data for a specific purpose

bull Each data item is stored in a field

bull A combination of fields makes up a table For example an employee tablecontains fields that contain data about an individual employee

A Database Management System (DBMS) is a collection of programs that enable usersto create maintain databases and control all the access to the databases The primarygoal of the DBMS is to provide an environment that is both convenient and efficientfor users to retrieve and store information

Fig 21

httpscnxorgcontentm28150latest With the database approach we can have thetraditional banking system as shown in the following image

5

Fig 22

httpscnxorgcontentm28150latest

Chapter 2 6

Chapter 3 Characteristics andBenefits of a Database

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a number of characteristics that distinguish the database approach with thefile-based approach

31 Self-Describing Nature of a Database SystemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A Database System contains not only the database itself but also the descriptions ofdata structure and constraints (meta-data) This information is used by the DBMSsoftware or database users if needed This separation makes a database systemtotally different from the traditional file-based system in which the data definition is apart of application programs

32 Insulation between Program and DataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the file based system the structure of the data files is defined in the applicationprograms so if a user wants to change the structure of a file all the programs thataccess that file might need to be changed as well On the other hand in the databaseapproach the data structure is stored in the system catalog not in the programsTherefore one change is all thatrsquos needed

33 Support multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A view is a subset of the database which is defined and dedicated for particular usersof the system Multiple users in the system might have different views of the systemEach view might contain only the data of interest to a user or a group of users

34 Sharing of data and Multiuser systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A multiuser database system must allow multiple users access to the database at thesame time As a result the multiuser DBMS must have concurrency control strategies

7

to ensure several users access to the same data item at the same time and to do so ina manner that the data will always be correct ndash data integrity

35 Control Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the Database approach ideally each data item is stored in only one place in thedatabase In some cases redundancy still exists so as to improve system performancebut such redundancy is controlled and kept to minimum

36 Data SharingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The integration of the whole data in an organization leads to the ability to producemore information from a given amount of data

37 Enforcing Integrity ConstraintsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

DBMSs should provide capabilities to define and enforce certain constraints such asdata type data uniqueness etc

38 Restricting Unauthorised AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Not all users of the system have the same accessing privileges DBMSs should providea security subsystem to create and control the user accounts

39 Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

System data (Meta Data) descriptions are separated from the application programsChanges to the data structure is handled by the DBMS and not embedded in theprogram

Chapter 3 8

310 Transaction ProcessingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The DBMS must include concurrency control subsystems to ensure that several userstrying to update the same data do so in a controlled manner The results of anyupdates to the database must maintain consistency and validity

311 Providing multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A view may be a subset of the database Various users may have different views of thedatabase itself Users may not need to be aware of how and where the data they referto is stored

312 Providing backup and recovery facilitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If the computer system fails in the middle of a complex update process the recoverysubsystem is responsible for making sure that the database is restored to the stage itwas in before the process started executing

313 Managing informationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Managing information means taking care of it so that it works for us and is useful forthe work we are doing The information we collect is no longer subject to ldquoaccidentaldisorganizationrdquo and becomes more easily accessible and integrated with the rest ofour work Managing information using a database allows us to become strategic usersof the data we have

We often need to access and re-sort data for various uses These may include

bull Creating mailing lists

bull Writing management reports

bull Generating lists of selected news stories

bull Identifying various client needs

The processing power of a database allows it to manipulate the data it houses so itcan

9

bull Sortbull Matchbull Linkbull Aggregatebull Skip fieldsbull Calculatebull Arrange

So a database tracks data finding the bits you need and processing them in the wayyou arrange for them to be processed

bull Because of the versatility of databases we find them powering all sorts ofprojects A database can be linked to

bull A web site that is capturing registered users

bull A client tracking application for social service organisations

bull A medical record system for a health care facility

bull Your personal address book in your e-mail client

bull A collection of word processed documents

bull A system that issues airline reservations

Source httpcnxorgcontentm28150latest (httpcnxorgcontentm28150latest)

Chapter 3 10

Chapter 4 Types of Database Models

41 High-level Conceptual Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Provide concepts that are close to the way people perceive data to present the data Atypical example of this type is the entity relationship model which uses main conceptslike entities attributes relationships An entity represents a real-world object such asan employee a project An entity has some attributes which represents properties ofentity such as employeersquos name address birthdate A relationship represents anassociation among entities for example an employee works on many projects Therelationship exists between employee and project

411 Record-based Logical Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Provide concepts that can be understood by the user but not too far from the waydata is stored in the computer Three well-known data models of this type arerelational data model network data model and hierarchical data model

bull The Relational model represents data as relations or tables

bull The Network model represents data as record types and also represents a limitedtype of one to many relationship called set type

Fig 41 httpcnxorgcontentm28150latest

The Hierarchical model represents data as a hierarchical tree structures Eachhierarchy represents a number of related records Here is the schema in hierarchicalmodel notation

11

Fig 42 httpcnxorgcontentm28150latest

Chapter 4 12

Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to

bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)

bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )

bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)

The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database

51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction

13

The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS

52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Is the userrsquos view of the database

bull contains multiple different external views

bull is closely related to the real world as perceived by each user

53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull These models provide a flexible data structuring capabilities

bull Community view the logical structure of the entire database

bull Contains lsquoWhatrsquo data is stored bull

bull Shows relationships among data

bull constraints

bull semantic information (business rules ndash discussed later)

bull security amp integrity information

bull A database is considered as a collection of entities (objects) of variouskinds

bull Basis for identification and high-level description of main data objects avoidsdetails

bull They are database independent It does not matter what database you will beusing

54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Database is considered as a collection of fixed ndash size record

bull This model is closer to the physical level or file structure

bull Is a representation of database as seen by the DBMS

Chapter 5 14

bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model

bull The entities in the conceptual model are mapped to the tables in the relationalmodel

bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model

55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Physical representation of the database

bull Lowest level of abstractions bull

bull It is how the data is stored dealing with

bull Run-time performance

bull Storage utilization amp compression

bull File organization amp access methods

bull Data encryption

bull Physical level ndash managed by OS

bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory

551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model

The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother

The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)

As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created

15

At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel

Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design

The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data

Fig 51

552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database

bull External Schemas

bull multiple multiple subschemas = multiple external views of the data bull

bull Conceptual Schema one only

bull Data items relationships amp constraints ndash ER diagram

bull Physical Schema one only

bull Record definitions representation methods

bull Access methods indexes hashing

Chapter 5 16

553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser

Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data

The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical

Source httpcnxorgcontentm28150latest

554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs

In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)

Source httpcnxorgcontentm28150latest

555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy

Source httpcnxorgcontentm28150latest

17

Chapter 6 Classification of DatabaseSystems

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The database management systems can be classified based on several criteria

61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine

62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently

63 Based on the ways database is distributed

631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

With centralized database systems the system is stored at a single site

Fig 61 Source httpcnxorgcontentm28150latest

Chapter 6 18

632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Actual database and DBMS software are distributed in various sites connected by acomputer network

Fig 62 Source httpcnxorgcontentm28150latest

633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily

634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Different sites might use different DBMS software There is additional software tosupport data exchange between sites

Source httpcnxorgcontentm28150latest

19

Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model

The relational model has provided the basis for

bull Research on theory of datarelationshipconstraint

bull Numerous database design methodologies

bull The standard database access language SQL

bull Almost all modern commercial database management systems

The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo

71 Fundamental concepts in Relational Data Model

711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute

Source httpcnxorgcontentm28250latest

712 TableAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables

Chapter 7 20

Fig 71

713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation

The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another

Look at the following example of an ID card to see the relationship between fields andtheir data

714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example

21

bull The domain of Marital status has a set of possibilities Married Single Divorced

bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip

bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000

bull The domain of First Name is the set of character strings that represents names ofpeople

In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter

715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records

A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases

Fig 72

The example above shows us how fields can hold a range of different sorts of data ndashthis one has

bull An ID field ordinal number integer

bull A Pubdate field daymonthyear date

bull An author field Initial Surname text

bull A title field text bull

bull A body text field text

By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates

Chapter 7 22

Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way

72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The degree is the number of attributes In our example above the degree is 4

73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Table has a name that is distinct from all other tables in the database

bull There are no duplicate rows each row is distinct

bull Entries in columns are atomic (no repeating groups or multivalued attributes)

bull Entries from columns are from the same domain based on their data type

number (numeric integer float smallinthellip)

character (string)

date

logical (true or false)

bull Operations combining different data types are disallowed

bull Each attribute has a distinct name

bull sequence of columns is insignificant

bull sequence of rows is insignificant

731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We have used many different terms Here is a summary Alternative 1 is the mostcommon

23

Chapter 7 24

Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The Entity Relationship Data Model (ER) has existed for over 35 years

The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations

ER modelling is based on two concepts

bull Entities that is things Eg Prof Ba Course Database System

bull Relationships that is associations or interactions between entities

Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams

The example database

For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model

This database contains the information about employees departments and projects

bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department

bull A department controls a number of projects each of which has unique name aunique number and the budget

bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects

bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee

81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An entity is an object in the real world with an independent existence and can bedifferentiated from other objects

An entity might be

bull An object with physical existence Eg a lecturer a student a car

25

bull An object with conceptual existence Eg a course a job a position

An Entity Type defines a collection of similar entities

An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box

Fig 81 Source httpcnxorgcontentm28250latest

811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics

bull they are the lsquobuilding blocksrsquo of a database

bull the primary key may be simple or composite

bull the primary key is not a foreign key

bull they do not depend on another entity for their existence

Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning

bull Dependent entities are used to connect two kernels together

bull They are said to be existent dependent on two or more tables

bull Many to many relationships become associative tables with at least two foreignkeys

bull They may contain other attributes

bull The foreign key identifies each associated table

bull There are three options for the primary key

bull use a composite of foreign keys of associated tables if unique

bull use a composite of foreign keys and qualifying column

bull create a new simple primary key

Characteristic entities provide more information about another table These entitieshave the following characteristic

Chapter 8 26

bull They represent multi-valued attributes

bull They describe other entities

bull They typically have a one to many relationship

bull The foreign key is used to further identify the characterized table

bull Options for primary key are as follows

bull foreign key plus a qualifying column

bull or create a new simple primary key

bull Employee(EID Name Address Age Salary)

bull EmployeePhone(EID Phone)

812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)

Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram

In the diagram each attribute is represented by an oval with a name inside

Fig 82 Source httpcnxorgcontentm28250latest

813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model

Simple attributes are attributes that are drawn from the atomic value domains

eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes

27

eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo

Fig 83 Source httpcnxorgcontentm28250latest

Multivalued attributes Attributes that have a set of values for each entity

eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo

Fig 84 Source httpcnxorgcontentm28250latest

Derived attributes Attributes contain values that are calculated from other attributes

eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute

Fig 85 Source httpcnxorgcontentm28250latest

Chapter 8 28

814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set

8141 Candidate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull a simple or composite key that is unique and minimal

bull unique ndash no two rows in a table may have the same value at any time

bull minimal ndash every column must be necessary for uniqueness

bull For example for the entity

Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)

Possible candidate keys are EID SIN

FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment

EID SIN are also candidate keys

8142 Composite key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Composed of more than one attribute

bull For example as discussed earlier

First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department

bull A composite key can have two or more attributes but it must be minimal

29

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 5: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

Chapter 14 Database Users 68141 End users 68142 Application Programmers 68143 Database Administrators 68

Chapter 1 Before the Advent ofDatabase Systems

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One way to keep information on a computer is to store it in permanent files Acompany system has a number of application programs each of them is defined tomanipulate data files These application programs have been written on request ofthe users in the organization New application will be added to the system as the needarises The system just described is called the file-based system

Consider a traditional banking system which uses the file-based system to manage theorganizationrsquos data in the picture below As we can see there are differentdepartments in the Bank each of them has their own applications that manage andmanipulate different data files For banking systems the programs can be the ones todebit or credit an account find the balance of an account add a new mortgage loanor generate monthly statements etc

Fig 11

httpcnxorg contentm28150latest (httpcnxorg20contentm28150latest)

11 File-based approach for banking systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Keeping organizational information in this approach has a number of disadvantagesincluding

12 Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Since files and applications are created by different programmers of variousdepartments over long periods of time it might lead to several problems such as

1

bull inconsistency in data format

bull the same information may be kept in several different place (files)

bull data inconsistency which means various copies of the same data are conflicting this wastes stor age space and duplicates effort

13 Data IsolationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull It is difficult for new applications to retrieve the appropriate data which might bestored in various files

14 Integrity problemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Data values must satisfy certain consistency constraints that are specified in theapplication programsbull

bull It is difficult to make changes to the programs to enforce new constraints

15 Security problemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull There are constraints regarding accessing privileges

bull Application requirements are added to the system in an ad-hoc manner so it isdifficult to enforce constraints

16 Concurrency ndash access anomaliesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Data may be accessed by many applications that have not been coordinatedpreviously so it is not easy to provide a strategy to support multiple users toupdate data simultaneously

These difficulties have prompted the development of a new approach in managinglarge amount of organizational information ndash the database approach

Chapter 1 2

17 Database ApproachAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Databases and database technology play an important role in most of areas wherecomputers are used including business education medicine etc To understand thefundamentals of database systems we start by introducing the basic concepts in thisarea

18 Role of databases in BusinessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Everybody uses a database in some way maybe just to store information about theirfriends and family That data might be written down or stored in a computer using aword processing program or it could even be saved in a spreadsheet However thebest place for it would be the Database Management Software (DBMS) The DBMS is apower software tool that allows you to store manipulate and retrieve data in a varietyof different ways

Most companies keep track of customer information by storing it in a database Thatdata may be customers employees products orders or anything that may assist thebusiness in their operations

19 Meaning of DataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data is factual information such as measurements or statistics about objects andconcepts We use it for discussion or calculation It could be a person a place anevent an action any one of a number of things A single fact is an element of data ora data element

If data is information and information is what we are all in the business of workingwith you can start to see where you may be storing it

bull Filing cabinets

bull Spreadsheets

bull Folders

bull Ledgers Lists

bull Piles of papers on your desk

They all store information and so too does a database Because of the mechanicalnature of databases they have terrific power to manage and process the information

3

they hold This can make the information they house much more useful for your workWith this understanding of data we can start to see how a tool with the capacity tostore a collection of data and organize it especially for rapid search retrieval andprocessing might make a difference to how we can use data This is all aboutmanaging information Source http cnxorgcontentm28150latest (httphttp20cnxorgcontentm28150latest)

Chapter 1 4

Chapter 2 Fundamental ConceptsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A Database is a shared collection of related data which is used to support the activitiesof a particular organization A database can be viewed as a repository of data that isdefined once and then is accessed by various users A database has the followingproperties

bull It is a representation of some aspect of the real world or perhaps a collection ofdata elements (facts) representing real-world information

bull A database is logical coherent and internally consistent

bull A database is designed built and populated with data for a specific purpose

bull Each data item is stored in a field

bull A combination of fields makes up a table For example an employee tablecontains fields that contain data about an individual employee

A Database Management System (DBMS) is a collection of programs that enable usersto create maintain databases and control all the access to the databases The primarygoal of the DBMS is to provide an environment that is both convenient and efficientfor users to retrieve and store information

Fig 21

httpscnxorgcontentm28150latest With the database approach we can have thetraditional banking system as shown in the following image

5

Fig 22

httpscnxorgcontentm28150latest

Chapter 2 6

Chapter 3 Characteristics andBenefits of a Database

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a number of characteristics that distinguish the database approach with thefile-based approach

31 Self-Describing Nature of a Database SystemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A Database System contains not only the database itself but also the descriptions ofdata structure and constraints (meta-data) This information is used by the DBMSsoftware or database users if needed This separation makes a database systemtotally different from the traditional file-based system in which the data definition is apart of application programs

32 Insulation between Program and DataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the file based system the structure of the data files is defined in the applicationprograms so if a user wants to change the structure of a file all the programs thataccess that file might need to be changed as well On the other hand in the databaseapproach the data structure is stored in the system catalog not in the programsTherefore one change is all thatrsquos needed

33 Support multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A view is a subset of the database which is defined and dedicated for particular usersof the system Multiple users in the system might have different views of the systemEach view might contain only the data of interest to a user or a group of users

34 Sharing of data and Multiuser systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A multiuser database system must allow multiple users access to the database at thesame time As a result the multiuser DBMS must have concurrency control strategies

7

to ensure several users access to the same data item at the same time and to do so ina manner that the data will always be correct ndash data integrity

35 Control Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the Database approach ideally each data item is stored in only one place in thedatabase In some cases redundancy still exists so as to improve system performancebut such redundancy is controlled and kept to minimum

36 Data SharingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The integration of the whole data in an organization leads to the ability to producemore information from a given amount of data

37 Enforcing Integrity ConstraintsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

DBMSs should provide capabilities to define and enforce certain constraints such asdata type data uniqueness etc

38 Restricting Unauthorised AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Not all users of the system have the same accessing privileges DBMSs should providea security subsystem to create and control the user accounts

39 Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

System data (Meta Data) descriptions are separated from the application programsChanges to the data structure is handled by the DBMS and not embedded in theprogram

Chapter 3 8

310 Transaction ProcessingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The DBMS must include concurrency control subsystems to ensure that several userstrying to update the same data do so in a controlled manner The results of anyupdates to the database must maintain consistency and validity

311 Providing multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A view may be a subset of the database Various users may have different views of thedatabase itself Users may not need to be aware of how and where the data they referto is stored

312 Providing backup and recovery facilitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If the computer system fails in the middle of a complex update process the recoverysubsystem is responsible for making sure that the database is restored to the stage itwas in before the process started executing

313 Managing informationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Managing information means taking care of it so that it works for us and is useful forthe work we are doing The information we collect is no longer subject to ldquoaccidentaldisorganizationrdquo and becomes more easily accessible and integrated with the rest ofour work Managing information using a database allows us to become strategic usersof the data we have

We often need to access and re-sort data for various uses These may include

bull Creating mailing lists

bull Writing management reports

bull Generating lists of selected news stories

bull Identifying various client needs

The processing power of a database allows it to manipulate the data it houses so itcan

9

bull Sortbull Matchbull Linkbull Aggregatebull Skip fieldsbull Calculatebull Arrange

So a database tracks data finding the bits you need and processing them in the wayyou arrange for them to be processed

bull Because of the versatility of databases we find them powering all sorts ofprojects A database can be linked to

bull A web site that is capturing registered users

bull A client tracking application for social service organisations

bull A medical record system for a health care facility

bull Your personal address book in your e-mail client

bull A collection of word processed documents

bull A system that issues airline reservations

Source httpcnxorgcontentm28150latest (httpcnxorgcontentm28150latest)

Chapter 3 10

Chapter 4 Types of Database Models

41 High-level Conceptual Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Provide concepts that are close to the way people perceive data to present the data Atypical example of this type is the entity relationship model which uses main conceptslike entities attributes relationships An entity represents a real-world object such asan employee a project An entity has some attributes which represents properties ofentity such as employeersquos name address birthdate A relationship represents anassociation among entities for example an employee works on many projects Therelationship exists between employee and project

411 Record-based Logical Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Provide concepts that can be understood by the user but not too far from the waydata is stored in the computer Three well-known data models of this type arerelational data model network data model and hierarchical data model

bull The Relational model represents data as relations or tables

bull The Network model represents data as record types and also represents a limitedtype of one to many relationship called set type

Fig 41 httpcnxorgcontentm28150latest

The Hierarchical model represents data as a hierarchical tree structures Eachhierarchy represents a number of related records Here is the schema in hierarchicalmodel notation

11

Fig 42 httpcnxorgcontentm28150latest

Chapter 4 12

Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to

bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)

bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )

bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)

The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database

51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction

13

The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS

52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Is the userrsquos view of the database

bull contains multiple different external views

bull is closely related to the real world as perceived by each user

53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull These models provide a flexible data structuring capabilities

bull Community view the logical structure of the entire database

bull Contains lsquoWhatrsquo data is stored bull

bull Shows relationships among data

bull constraints

bull semantic information (business rules ndash discussed later)

bull security amp integrity information

bull A database is considered as a collection of entities (objects) of variouskinds

bull Basis for identification and high-level description of main data objects avoidsdetails

bull They are database independent It does not matter what database you will beusing

54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Database is considered as a collection of fixed ndash size record

bull This model is closer to the physical level or file structure

bull Is a representation of database as seen by the DBMS

Chapter 5 14

bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model

bull The entities in the conceptual model are mapped to the tables in the relationalmodel

bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model

55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Physical representation of the database

bull Lowest level of abstractions bull

bull It is how the data is stored dealing with

bull Run-time performance

bull Storage utilization amp compression

bull File organization amp access methods

bull Data encryption

bull Physical level ndash managed by OS

bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory

551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model

The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother

The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)

As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created

15

At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel

Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design

The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data

Fig 51

552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database

bull External Schemas

bull multiple multiple subschemas = multiple external views of the data bull

bull Conceptual Schema one only

bull Data items relationships amp constraints ndash ER diagram

bull Physical Schema one only

bull Record definitions representation methods

bull Access methods indexes hashing

Chapter 5 16

553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser

Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data

The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical

Source httpcnxorgcontentm28150latest

554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs

In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)

Source httpcnxorgcontentm28150latest

555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy

Source httpcnxorgcontentm28150latest

17

Chapter 6 Classification of DatabaseSystems

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The database management systems can be classified based on several criteria

61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine

62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently

63 Based on the ways database is distributed

631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

With centralized database systems the system is stored at a single site

Fig 61 Source httpcnxorgcontentm28150latest

Chapter 6 18

632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Actual database and DBMS software are distributed in various sites connected by acomputer network

Fig 62 Source httpcnxorgcontentm28150latest

633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily

634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Different sites might use different DBMS software There is additional software tosupport data exchange between sites

Source httpcnxorgcontentm28150latest

19

Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model

The relational model has provided the basis for

bull Research on theory of datarelationshipconstraint

bull Numerous database design methodologies

bull The standard database access language SQL

bull Almost all modern commercial database management systems

The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo

71 Fundamental concepts in Relational Data Model

711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute

Source httpcnxorgcontentm28250latest

712 TableAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables

Chapter 7 20

Fig 71

713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation

The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another

Look at the following example of an ID card to see the relationship between fields andtheir data

714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example

21

bull The domain of Marital status has a set of possibilities Married Single Divorced

bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip

bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000

bull The domain of First Name is the set of character strings that represents names ofpeople

In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter

715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records

A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases

Fig 72

The example above shows us how fields can hold a range of different sorts of data ndashthis one has

bull An ID field ordinal number integer

bull A Pubdate field daymonthyear date

bull An author field Initial Surname text

bull A title field text bull

bull A body text field text

By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates

Chapter 7 22

Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way

72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The degree is the number of attributes In our example above the degree is 4

73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Table has a name that is distinct from all other tables in the database

bull There are no duplicate rows each row is distinct

bull Entries in columns are atomic (no repeating groups or multivalued attributes)

bull Entries from columns are from the same domain based on their data type

number (numeric integer float smallinthellip)

character (string)

date

logical (true or false)

bull Operations combining different data types are disallowed

bull Each attribute has a distinct name

bull sequence of columns is insignificant

bull sequence of rows is insignificant

731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We have used many different terms Here is a summary Alternative 1 is the mostcommon

23

Chapter 7 24

Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The Entity Relationship Data Model (ER) has existed for over 35 years

The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations

ER modelling is based on two concepts

bull Entities that is things Eg Prof Ba Course Database System

bull Relationships that is associations or interactions between entities

Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams

The example database

For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model

This database contains the information about employees departments and projects

bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department

bull A department controls a number of projects each of which has unique name aunique number and the budget

bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects

bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee

81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An entity is an object in the real world with an independent existence and can bedifferentiated from other objects

An entity might be

bull An object with physical existence Eg a lecturer a student a car

25

bull An object with conceptual existence Eg a course a job a position

An Entity Type defines a collection of similar entities

An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box

Fig 81 Source httpcnxorgcontentm28250latest

811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics

bull they are the lsquobuilding blocksrsquo of a database

bull the primary key may be simple or composite

bull the primary key is not a foreign key

bull they do not depend on another entity for their existence

Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning

bull Dependent entities are used to connect two kernels together

bull They are said to be existent dependent on two or more tables

bull Many to many relationships become associative tables with at least two foreignkeys

bull They may contain other attributes

bull The foreign key identifies each associated table

bull There are three options for the primary key

bull use a composite of foreign keys of associated tables if unique

bull use a composite of foreign keys and qualifying column

bull create a new simple primary key

Characteristic entities provide more information about another table These entitieshave the following characteristic

Chapter 8 26

bull They represent multi-valued attributes

bull They describe other entities

bull They typically have a one to many relationship

bull The foreign key is used to further identify the characterized table

bull Options for primary key are as follows

bull foreign key plus a qualifying column

bull or create a new simple primary key

bull Employee(EID Name Address Age Salary)

bull EmployeePhone(EID Phone)

812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)

Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram

In the diagram each attribute is represented by an oval with a name inside

Fig 82 Source httpcnxorgcontentm28250latest

813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model

Simple attributes are attributes that are drawn from the atomic value domains

eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes

27

eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo

Fig 83 Source httpcnxorgcontentm28250latest

Multivalued attributes Attributes that have a set of values for each entity

eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo

Fig 84 Source httpcnxorgcontentm28250latest

Derived attributes Attributes contain values that are calculated from other attributes

eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute

Fig 85 Source httpcnxorgcontentm28250latest

Chapter 8 28

814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set

8141 Candidate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull a simple or composite key that is unique and minimal

bull unique ndash no two rows in a table may have the same value at any time

bull minimal ndash every column must be necessary for uniqueness

bull For example for the entity

Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)

Possible candidate keys are EID SIN

FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment

EID SIN are also candidate keys

8142 Composite key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Composed of more than one attribute

bull For example as discussed earlier

First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department

bull A composite key can have two or more attributes but it must be minimal

29

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 6: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

Chapter 1 Before the Advent ofDatabase Systems

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One way to keep information on a computer is to store it in permanent files Acompany system has a number of application programs each of them is defined tomanipulate data files These application programs have been written on request ofthe users in the organization New application will be added to the system as the needarises The system just described is called the file-based system

Consider a traditional banking system which uses the file-based system to manage theorganizationrsquos data in the picture below As we can see there are differentdepartments in the Bank each of them has their own applications that manage andmanipulate different data files For banking systems the programs can be the ones todebit or credit an account find the balance of an account add a new mortgage loanor generate monthly statements etc

Fig 11

httpcnxorg contentm28150latest (httpcnxorg20contentm28150latest)

11 File-based approach for banking systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Keeping organizational information in this approach has a number of disadvantagesincluding

12 Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Since files and applications are created by different programmers of variousdepartments over long periods of time it might lead to several problems such as

1

bull inconsistency in data format

bull the same information may be kept in several different place (files)

bull data inconsistency which means various copies of the same data are conflicting this wastes stor age space and duplicates effort

13 Data IsolationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull It is difficult for new applications to retrieve the appropriate data which might bestored in various files

14 Integrity problemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Data values must satisfy certain consistency constraints that are specified in theapplication programsbull

bull It is difficult to make changes to the programs to enforce new constraints

15 Security problemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull There are constraints regarding accessing privileges

bull Application requirements are added to the system in an ad-hoc manner so it isdifficult to enforce constraints

16 Concurrency ndash access anomaliesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Data may be accessed by many applications that have not been coordinatedpreviously so it is not easy to provide a strategy to support multiple users toupdate data simultaneously

These difficulties have prompted the development of a new approach in managinglarge amount of organizational information ndash the database approach

Chapter 1 2

17 Database ApproachAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Databases and database technology play an important role in most of areas wherecomputers are used including business education medicine etc To understand thefundamentals of database systems we start by introducing the basic concepts in thisarea

18 Role of databases in BusinessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Everybody uses a database in some way maybe just to store information about theirfriends and family That data might be written down or stored in a computer using aword processing program or it could even be saved in a spreadsheet However thebest place for it would be the Database Management Software (DBMS) The DBMS is apower software tool that allows you to store manipulate and retrieve data in a varietyof different ways

Most companies keep track of customer information by storing it in a database Thatdata may be customers employees products orders or anything that may assist thebusiness in their operations

19 Meaning of DataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data is factual information such as measurements or statistics about objects andconcepts We use it for discussion or calculation It could be a person a place anevent an action any one of a number of things A single fact is an element of data ora data element

If data is information and information is what we are all in the business of workingwith you can start to see where you may be storing it

bull Filing cabinets

bull Spreadsheets

bull Folders

bull Ledgers Lists

bull Piles of papers on your desk

They all store information and so too does a database Because of the mechanicalnature of databases they have terrific power to manage and process the information

3

they hold This can make the information they house much more useful for your workWith this understanding of data we can start to see how a tool with the capacity tostore a collection of data and organize it especially for rapid search retrieval andprocessing might make a difference to how we can use data This is all aboutmanaging information Source http cnxorgcontentm28150latest (httphttp20cnxorgcontentm28150latest)

Chapter 1 4

Chapter 2 Fundamental ConceptsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A Database is a shared collection of related data which is used to support the activitiesof a particular organization A database can be viewed as a repository of data that isdefined once and then is accessed by various users A database has the followingproperties

bull It is a representation of some aspect of the real world or perhaps a collection ofdata elements (facts) representing real-world information

bull A database is logical coherent and internally consistent

bull A database is designed built and populated with data for a specific purpose

bull Each data item is stored in a field

bull A combination of fields makes up a table For example an employee tablecontains fields that contain data about an individual employee

A Database Management System (DBMS) is a collection of programs that enable usersto create maintain databases and control all the access to the databases The primarygoal of the DBMS is to provide an environment that is both convenient and efficientfor users to retrieve and store information

Fig 21

httpscnxorgcontentm28150latest With the database approach we can have thetraditional banking system as shown in the following image

5

Fig 22

httpscnxorgcontentm28150latest

Chapter 2 6

Chapter 3 Characteristics andBenefits of a Database

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a number of characteristics that distinguish the database approach with thefile-based approach

31 Self-Describing Nature of a Database SystemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A Database System contains not only the database itself but also the descriptions ofdata structure and constraints (meta-data) This information is used by the DBMSsoftware or database users if needed This separation makes a database systemtotally different from the traditional file-based system in which the data definition is apart of application programs

32 Insulation between Program and DataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the file based system the structure of the data files is defined in the applicationprograms so if a user wants to change the structure of a file all the programs thataccess that file might need to be changed as well On the other hand in the databaseapproach the data structure is stored in the system catalog not in the programsTherefore one change is all thatrsquos needed

33 Support multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A view is a subset of the database which is defined and dedicated for particular usersof the system Multiple users in the system might have different views of the systemEach view might contain only the data of interest to a user or a group of users

34 Sharing of data and Multiuser systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A multiuser database system must allow multiple users access to the database at thesame time As a result the multiuser DBMS must have concurrency control strategies

7

to ensure several users access to the same data item at the same time and to do so ina manner that the data will always be correct ndash data integrity

35 Control Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the Database approach ideally each data item is stored in only one place in thedatabase In some cases redundancy still exists so as to improve system performancebut such redundancy is controlled and kept to minimum

36 Data SharingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The integration of the whole data in an organization leads to the ability to producemore information from a given amount of data

37 Enforcing Integrity ConstraintsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

DBMSs should provide capabilities to define and enforce certain constraints such asdata type data uniqueness etc

38 Restricting Unauthorised AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Not all users of the system have the same accessing privileges DBMSs should providea security subsystem to create and control the user accounts

39 Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

System data (Meta Data) descriptions are separated from the application programsChanges to the data structure is handled by the DBMS and not embedded in theprogram

Chapter 3 8

310 Transaction ProcessingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The DBMS must include concurrency control subsystems to ensure that several userstrying to update the same data do so in a controlled manner The results of anyupdates to the database must maintain consistency and validity

311 Providing multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A view may be a subset of the database Various users may have different views of thedatabase itself Users may not need to be aware of how and where the data they referto is stored

312 Providing backup and recovery facilitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If the computer system fails in the middle of a complex update process the recoverysubsystem is responsible for making sure that the database is restored to the stage itwas in before the process started executing

313 Managing informationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Managing information means taking care of it so that it works for us and is useful forthe work we are doing The information we collect is no longer subject to ldquoaccidentaldisorganizationrdquo and becomes more easily accessible and integrated with the rest ofour work Managing information using a database allows us to become strategic usersof the data we have

We often need to access and re-sort data for various uses These may include

bull Creating mailing lists

bull Writing management reports

bull Generating lists of selected news stories

bull Identifying various client needs

The processing power of a database allows it to manipulate the data it houses so itcan

9

bull Sortbull Matchbull Linkbull Aggregatebull Skip fieldsbull Calculatebull Arrange

So a database tracks data finding the bits you need and processing them in the wayyou arrange for them to be processed

bull Because of the versatility of databases we find them powering all sorts ofprojects A database can be linked to

bull A web site that is capturing registered users

bull A client tracking application for social service organisations

bull A medical record system for a health care facility

bull Your personal address book in your e-mail client

bull A collection of word processed documents

bull A system that issues airline reservations

Source httpcnxorgcontentm28150latest (httpcnxorgcontentm28150latest)

Chapter 3 10

Chapter 4 Types of Database Models

41 High-level Conceptual Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Provide concepts that are close to the way people perceive data to present the data Atypical example of this type is the entity relationship model which uses main conceptslike entities attributes relationships An entity represents a real-world object such asan employee a project An entity has some attributes which represents properties ofentity such as employeersquos name address birthdate A relationship represents anassociation among entities for example an employee works on many projects Therelationship exists between employee and project

411 Record-based Logical Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Provide concepts that can be understood by the user but not too far from the waydata is stored in the computer Three well-known data models of this type arerelational data model network data model and hierarchical data model

bull The Relational model represents data as relations or tables

bull The Network model represents data as record types and also represents a limitedtype of one to many relationship called set type

Fig 41 httpcnxorgcontentm28150latest

The Hierarchical model represents data as a hierarchical tree structures Eachhierarchy represents a number of related records Here is the schema in hierarchicalmodel notation

11

Fig 42 httpcnxorgcontentm28150latest

Chapter 4 12

Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to

bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)

bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )

bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)

The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database

51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction

13

The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS

52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Is the userrsquos view of the database

bull contains multiple different external views

bull is closely related to the real world as perceived by each user

53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull These models provide a flexible data structuring capabilities

bull Community view the logical structure of the entire database

bull Contains lsquoWhatrsquo data is stored bull

bull Shows relationships among data

bull constraints

bull semantic information (business rules ndash discussed later)

bull security amp integrity information

bull A database is considered as a collection of entities (objects) of variouskinds

bull Basis for identification and high-level description of main data objects avoidsdetails

bull They are database independent It does not matter what database you will beusing

54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Database is considered as a collection of fixed ndash size record

bull This model is closer to the physical level or file structure

bull Is a representation of database as seen by the DBMS

Chapter 5 14

bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model

bull The entities in the conceptual model are mapped to the tables in the relationalmodel

bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model

55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Physical representation of the database

bull Lowest level of abstractions bull

bull It is how the data is stored dealing with

bull Run-time performance

bull Storage utilization amp compression

bull File organization amp access methods

bull Data encryption

bull Physical level ndash managed by OS

bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory

551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model

The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother

The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)

As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created

15

At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel

Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design

The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data

Fig 51

552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database

bull External Schemas

bull multiple multiple subschemas = multiple external views of the data bull

bull Conceptual Schema one only

bull Data items relationships amp constraints ndash ER diagram

bull Physical Schema one only

bull Record definitions representation methods

bull Access methods indexes hashing

Chapter 5 16

553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser

Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data

The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical

Source httpcnxorgcontentm28150latest

554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs

In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)

Source httpcnxorgcontentm28150latest

555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy

Source httpcnxorgcontentm28150latest

17

Chapter 6 Classification of DatabaseSystems

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The database management systems can be classified based on several criteria

61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine

62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently

63 Based on the ways database is distributed

631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

With centralized database systems the system is stored at a single site

Fig 61 Source httpcnxorgcontentm28150latest

Chapter 6 18

632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Actual database and DBMS software are distributed in various sites connected by acomputer network

Fig 62 Source httpcnxorgcontentm28150latest

633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily

634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Different sites might use different DBMS software There is additional software tosupport data exchange between sites

Source httpcnxorgcontentm28150latest

19

Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model

The relational model has provided the basis for

bull Research on theory of datarelationshipconstraint

bull Numerous database design methodologies

bull The standard database access language SQL

bull Almost all modern commercial database management systems

The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo

71 Fundamental concepts in Relational Data Model

711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute

Source httpcnxorgcontentm28250latest

712 TableAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables

Chapter 7 20

Fig 71

713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation

The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another

Look at the following example of an ID card to see the relationship between fields andtheir data

714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example

21

bull The domain of Marital status has a set of possibilities Married Single Divorced

bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip

bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000

bull The domain of First Name is the set of character strings that represents names ofpeople

In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter

715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records

A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases

Fig 72

The example above shows us how fields can hold a range of different sorts of data ndashthis one has

bull An ID field ordinal number integer

bull A Pubdate field daymonthyear date

bull An author field Initial Surname text

bull A title field text bull

bull A body text field text

By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates

Chapter 7 22

Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way

72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The degree is the number of attributes In our example above the degree is 4

73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Table has a name that is distinct from all other tables in the database

bull There are no duplicate rows each row is distinct

bull Entries in columns are atomic (no repeating groups or multivalued attributes)

bull Entries from columns are from the same domain based on their data type

number (numeric integer float smallinthellip)

character (string)

date

logical (true or false)

bull Operations combining different data types are disallowed

bull Each attribute has a distinct name

bull sequence of columns is insignificant

bull sequence of rows is insignificant

731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We have used many different terms Here is a summary Alternative 1 is the mostcommon

23

Chapter 7 24

Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The Entity Relationship Data Model (ER) has existed for over 35 years

The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations

ER modelling is based on two concepts

bull Entities that is things Eg Prof Ba Course Database System

bull Relationships that is associations or interactions between entities

Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams

The example database

For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model

This database contains the information about employees departments and projects

bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department

bull A department controls a number of projects each of which has unique name aunique number and the budget

bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects

bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee

81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An entity is an object in the real world with an independent existence and can bedifferentiated from other objects

An entity might be

bull An object with physical existence Eg a lecturer a student a car

25

bull An object with conceptual existence Eg a course a job a position

An Entity Type defines a collection of similar entities

An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box

Fig 81 Source httpcnxorgcontentm28250latest

811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics

bull they are the lsquobuilding blocksrsquo of a database

bull the primary key may be simple or composite

bull the primary key is not a foreign key

bull they do not depend on another entity for their existence

Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning

bull Dependent entities are used to connect two kernels together

bull They are said to be existent dependent on two or more tables

bull Many to many relationships become associative tables with at least two foreignkeys

bull They may contain other attributes

bull The foreign key identifies each associated table

bull There are three options for the primary key

bull use a composite of foreign keys of associated tables if unique

bull use a composite of foreign keys and qualifying column

bull create a new simple primary key

Characteristic entities provide more information about another table These entitieshave the following characteristic

Chapter 8 26

bull They represent multi-valued attributes

bull They describe other entities

bull They typically have a one to many relationship

bull The foreign key is used to further identify the characterized table

bull Options for primary key are as follows

bull foreign key plus a qualifying column

bull or create a new simple primary key

bull Employee(EID Name Address Age Salary)

bull EmployeePhone(EID Phone)

812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)

Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram

In the diagram each attribute is represented by an oval with a name inside

Fig 82 Source httpcnxorgcontentm28250latest

813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model

Simple attributes are attributes that are drawn from the atomic value domains

eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes

27

eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo

Fig 83 Source httpcnxorgcontentm28250latest

Multivalued attributes Attributes that have a set of values for each entity

eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo

Fig 84 Source httpcnxorgcontentm28250latest

Derived attributes Attributes contain values that are calculated from other attributes

eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute

Fig 85 Source httpcnxorgcontentm28250latest

Chapter 8 28

814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set

8141 Candidate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull a simple or composite key that is unique and minimal

bull unique ndash no two rows in a table may have the same value at any time

bull minimal ndash every column must be necessary for uniqueness

bull For example for the entity

Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)

Possible candidate keys are EID SIN

FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment

EID SIN are also candidate keys

8142 Composite key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Composed of more than one attribute

bull For example as discussed earlier

First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department

bull A composite key can have two or more attributes but it must be minimal

29

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 7: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

bull inconsistency in data format

bull the same information may be kept in several different place (files)

bull data inconsistency which means various copies of the same data are conflicting this wastes stor age space and duplicates effort

13 Data IsolationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull It is difficult for new applications to retrieve the appropriate data which might bestored in various files

14 Integrity problemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Data values must satisfy certain consistency constraints that are specified in theapplication programsbull

bull It is difficult to make changes to the programs to enforce new constraints

15 Security problemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull There are constraints regarding accessing privileges

bull Application requirements are added to the system in an ad-hoc manner so it isdifficult to enforce constraints

16 Concurrency ndash access anomaliesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Data may be accessed by many applications that have not been coordinatedpreviously so it is not easy to provide a strategy to support multiple users toupdate data simultaneously

These difficulties have prompted the development of a new approach in managinglarge amount of organizational information ndash the database approach

Chapter 1 2

17 Database ApproachAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Databases and database technology play an important role in most of areas wherecomputers are used including business education medicine etc To understand thefundamentals of database systems we start by introducing the basic concepts in thisarea

18 Role of databases in BusinessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Everybody uses a database in some way maybe just to store information about theirfriends and family That data might be written down or stored in a computer using aword processing program or it could even be saved in a spreadsheet However thebest place for it would be the Database Management Software (DBMS) The DBMS is apower software tool that allows you to store manipulate and retrieve data in a varietyof different ways

Most companies keep track of customer information by storing it in a database Thatdata may be customers employees products orders or anything that may assist thebusiness in their operations

19 Meaning of DataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data is factual information such as measurements or statistics about objects andconcepts We use it for discussion or calculation It could be a person a place anevent an action any one of a number of things A single fact is an element of data ora data element

If data is information and information is what we are all in the business of workingwith you can start to see where you may be storing it

bull Filing cabinets

bull Spreadsheets

bull Folders

bull Ledgers Lists

bull Piles of papers on your desk

They all store information and so too does a database Because of the mechanicalnature of databases they have terrific power to manage and process the information

3

they hold This can make the information they house much more useful for your workWith this understanding of data we can start to see how a tool with the capacity tostore a collection of data and organize it especially for rapid search retrieval andprocessing might make a difference to how we can use data This is all aboutmanaging information Source http cnxorgcontentm28150latest (httphttp20cnxorgcontentm28150latest)

Chapter 1 4

Chapter 2 Fundamental ConceptsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A Database is a shared collection of related data which is used to support the activitiesof a particular organization A database can be viewed as a repository of data that isdefined once and then is accessed by various users A database has the followingproperties

bull It is a representation of some aspect of the real world or perhaps a collection ofdata elements (facts) representing real-world information

bull A database is logical coherent and internally consistent

bull A database is designed built and populated with data for a specific purpose

bull Each data item is stored in a field

bull A combination of fields makes up a table For example an employee tablecontains fields that contain data about an individual employee

A Database Management System (DBMS) is a collection of programs that enable usersto create maintain databases and control all the access to the databases The primarygoal of the DBMS is to provide an environment that is both convenient and efficientfor users to retrieve and store information

Fig 21

httpscnxorgcontentm28150latest With the database approach we can have thetraditional banking system as shown in the following image

5

Fig 22

httpscnxorgcontentm28150latest

Chapter 2 6

Chapter 3 Characteristics andBenefits of a Database

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a number of characteristics that distinguish the database approach with thefile-based approach

31 Self-Describing Nature of a Database SystemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A Database System contains not only the database itself but also the descriptions ofdata structure and constraints (meta-data) This information is used by the DBMSsoftware or database users if needed This separation makes a database systemtotally different from the traditional file-based system in which the data definition is apart of application programs

32 Insulation between Program and DataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the file based system the structure of the data files is defined in the applicationprograms so if a user wants to change the structure of a file all the programs thataccess that file might need to be changed as well On the other hand in the databaseapproach the data structure is stored in the system catalog not in the programsTherefore one change is all thatrsquos needed

33 Support multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A view is a subset of the database which is defined and dedicated for particular usersof the system Multiple users in the system might have different views of the systemEach view might contain only the data of interest to a user or a group of users

34 Sharing of data and Multiuser systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A multiuser database system must allow multiple users access to the database at thesame time As a result the multiuser DBMS must have concurrency control strategies

7

to ensure several users access to the same data item at the same time and to do so ina manner that the data will always be correct ndash data integrity

35 Control Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the Database approach ideally each data item is stored in only one place in thedatabase In some cases redundancy still exists so as to improve system performancebut such redundancy is controlled and kept to minimum

36 Data SharingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The integration of the whole data in an organization leads to the ability to producemore information from a given amount of data

37 Enforcing Integrity ConstraintsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

DBMSs should provide capabilities to define and enforce certain constraints such asdata type data uniqueness etc

38 Restricting Unauthorised AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Not all users of the system have the same accessing privileges DBMSs should providea security subsystem to create and control the user accounts

39 Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

System data (Meta Data) descriptions are separated from the application programsChanges to the data structure is handled by the DBMS and not embedded in theprogram

Chapter 3 8

310 Transaction ProcessingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The DBMS must include concurrency control subsystems to ensure that several userstrying to update the same data do so in a controlled manner The results of anyupdates to the database must maintain consistency and validity

311 Providing multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A view may be a subset of the database Various users may have different views of thedatabase itself Users may not need to be aware of how and where the data they referto is stored

312 Providing backup and recovery facilitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If the computer system fails in the middle of a complex update process the recoverysubsystem is responsible for making sure that the database is restored to the stage itwas in before the process started executing

313 Managing informationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Managing information means taking care of it so that it works for us and is useful forthe work we are doing The information we collect is no longer subject to ldquoaccidentaldisorganizationrdquo and becomes more easily accessible and integrated with the rest ofour work Managing information using a database allows us to become strategic usersof the data we have

We often need to access and re-sort data for various uses These may include

bull Creating mailing lists

bull Writing management reports

bull Generating lists of selected news stories

bull Identifying various client needs

The processing power of a database allows it to manipulate the data it houses so itcan

9

bull Sortbull Matchbull Linkbull Aggregatebull Skip fieldsbull Calculatebull Arrange

So a database tracks data finding the bits you need and processing them in the wayyou arrange for them to be processed

bull Because of the versatility of databases we find them powering all sorts ofprojects A database can be linked to

bull A web site that is capturing registered users

bull A client tracking application for social service organisations

bull A medical record system for a health care facility

bull Your personal address book in your e-mail client

bull A collection of word processed documents

bull A system that issues airline reservations

Source httpcnxorgcontentm28150latest (httpcnxorgcontentm28150latest)

Chapter 3 10

Chapter 4 Types of Database Models

41 High-level Conceptual Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Provide concepts that are close to the way people perceive data to present the data Atypical example of this type is the entity relationship model which uses main conceptslike entities attributes relationships An entity represents a real-world object such asan employee a project An entity has some attributes which represents properties ofentity such as employeersquos name address birthdate A relationship represents anassociation among entities for example an employee works on many projects Therelationship exists between employee and project

411 Record-based Logical Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Provide concepts that can be understood by the user but not too far from the waydata is stored in the computer Three well-known data models of this type arerelational data model network data model and hierarchical data model

bull The Relational model represents data as relations or tables

bull The Network model represents data as record types and also represents a limitedtype of one to many relationship called set type

Fig 41 httpcnxorgcontentm28150latest

The Hierarchical model represents data as a hierarchical tree structures Eachhierarchy represents a number of related records Here is the schema in hierarchicalmodel notation

11

Fig 42 httpcnxorgcontentm28150latest

Chapter 4 12

Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to

bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)

bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )

bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)

The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database

51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction

13

The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS

52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Is the userrsquos view of the database

bull contains multiple different external views

bull is closely related to the real world as perceived by each user

53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull These models provide a flexible data structuring capabilities

bull Community view the logical structure of the entire database

bull Contains lsquoWhatrsquo data is stored bull

bull Shows relationships among data

bull constraints

bull semantic information (business rules ndash discussed later)

bull security amp integrity information

bull A database is considered as a collection of entities (objects) of variouskinds

bull Basis for identification and high-level description of main data objects avoidsdetails

bull They are database independent It does not matter what database you will beusing

54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Database is considered as a collection of fixed ndash size record

bull This model is closer to the physical level or file structure

bull Is a representation of database as seen by the DBMS

Chapter 5 14

bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model

bull The entities in the conceptual model are mapped to the tables in the relationalmodel

bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model

55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Physical representation of the database

bull Lowest level of abstractions bull

bull It is how the data is stored dealing with

bull Run-time performance

bull Storage utilization amp compression

bull File organization amp access methods

bull Data encryption

bull Physical level ndash managed by OS

bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory

551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model

The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother

The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)

As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created

15

At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel

Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design

The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data

Fig 51

552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database

bull External Schemas

bull multiple multiple subschemas = multiple external views of the data bull

bull Conceptual Schema one only

bull Data items relationships amp constraints ndash ER diagram

bull Physical Schema one only

bull Record definitions representation methods

bull Access methods indexes hashing

Chapter 5 16

553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser

Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data

The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical

Source httpcnxorgcontentm28150latest

554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs

In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)

Source httpcnxorgcontentm28150latest

555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy

Source httpcnxorgcontentm28150latest

17

Chapter 6 Classification of DatabaseSystems

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The database management systems can be classified based on several criteria

61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine

62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently

63 Based on the ways database is distributed

631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

With centralized database systems the system is stored at a single site

Fig 61 Source httpcnxorgcontentm28150latest

Chapter 6 18

632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Actual database and DBMS software are distributed in various sites connected by acomputer network

Fig 62 Source httpcnxorgcontentm28150latest

633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily

634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Different sites might use different DBMS software There is additional software tosupport data exchange between sites

Source httpcnxorgcontentm28150latest

19

Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model

The relational model has provided the basis for

bull Research on theory of datarelationshipconstraint

bull Numerous database design methodologies

bull The standard database access language SQL

bull Almost all modern commercial database management systems

The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo

71 Fundamental concepts in Relational Data Model

711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute

Source httpcnxorgcontentm28250latest

712 TableAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables

Chapter 7 20

Fig 71

713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation

The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another

Look at the following example of an ID card to see the relationship between fields andtheir data

714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example

21

bull The domain of Marital status has a set of possibilities Married Single Divorced

bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip

bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000

bull The domain of First Name is the set of character strings that represents names ofpeople

In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter

715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records

A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases

Fig 72

The example above shows us how fields can hold a range of different sorts of data ndashthis one has

bull An ID field ordinal number integer

bull A Pubdate field daymonthyear date

bull An author field Initial Surname text

bull A title field text bull

bull A body text field text

By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates

Chapter 7 22

Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way

72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The degree is the number of attributes In our example above the degree is 4

73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Table has a name that is distinct from all other tables in the database

bull There are no duplicate rows each row is distinct

bull Entries in columns are atomic (no repeating groups or multivalued attributes)

bull Entries from columns are from the same domain based on their data type

number (numeric integer float smallinthellip)

character (string)

date

logical (true or false)

bull Operations combining different data types are disallowed

bull Each attribute has a distinct name

bull sequence of columns is insignificant

bull sequence of rows is insignificant

731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We have used many different terms Here is a summary Alternative 1 is the mostcommon

23

Chapter 7 24

Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The Entity Relationship Data Model (ER) has existed for over 35 years

The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations

ER modelling is based on two concepts

bull Entities that is things Eg Prof Ba Course Database System

bull Relationships that is associations or interactions between entities

Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams

The example database

For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model

This database contains the information about employees departments and projects

bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department

bull A department controls a number of projects each of which has unique name aunique number and the budget

bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects

bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee

81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An entity is an object in the real world with an independent existence and can bedifferentiated from other objects

An entity might be

bull An object with physical existence Eg a lecturer a student a car

25

bull An object with conceptual existence Eg a course a job a position

An Entity Type defines a collection of similar entities

An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box

Fig 81 Source httpcnxorgcontentm28250latest

811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics

bull they are the lsquobuilding blocksrsquo of a database

bull the primary key may be simple or composite

bull the primary key is not a foreign key

bull they do not depend on another entity for their existence

Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning

bull Dependent entities are used to connect two kernels together

bull They are said to be existent dependent on two or more tables

bull Many to many relationships become associative tables with at least two foreignkeys

bull They may contain other attributes

bull The foreign key identifies each associated table

bull There are three options for the primary key

bull use a composite of foreign keys of associated tables if unique

bull use a composite of foreign keys and qualifying column

bull create a new simple primary key

Characteristic entities provide more information about another table These entitieshave the following characteristic

Chapter 8 26

bull They represent multi-valued attributes

bull They describe other entities

bull They typically have a one to many relationship

bull The foreign key is used to further identify the characterized table

bull Options for primary key are as follows

bull foreign key plus a qualifying column

bull or create a new simple primary key

bull Employee(EID Name Address Age Salary)

bull EmployeePhone(EID Phone)

812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)

Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram

In the diagram each attribute is represented by an oval with a name inside

Fig 82 Source httpcnxorgcontentm28250latest

813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model

Simple attributes are attributes that are drawn from the atomic value domains

eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes

27

eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo

Fig 83 Source httpcnxorgcontentm28250latest

Multivalued attributes Attributes that have a set of values for each entity

eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo

Fig 84 Source httpcnxorgcontentm28250latest

Derived attributes Attributes contain values that are calculated from other attributes

eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute

Fig 85 Source httpcnxorgcontentm28250latest

Chapter 8 28

814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set

8141 Candidate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull a simple or composite key that is unique and minimal

bull unique ndash no two rows in a table may have the same value at any time

bull minimal ndash every column must be necessary for uniqueness

bull For example for the entity

Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)

Possible candidate keys are EID SIN

FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment

EID SIN are also candidate keys

8142 Composite key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Composed of more than one attribute

bull For example as discussed earlier

First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department

bull A composite key can have two or more attributes but it must be minimal

29

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 8: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

17 Database ApproachAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Databases and database technology play an important role in most of areas wherecomputers are used including business education medicine etc To understand thefundamentals of database systems we start by introducing the basic concepts in thisarea

18 Role of databases in BusinessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Everybody uses a database in some way maybe just to store information about theirfriends and family That data might be written down or stored in a computer using aword processing program or it could even be saved in a spreadsheet However thebest place for it would be the Database Management Software (DBMS) The DBMS is apower software tool that allows you to store manipulate and retrieve data in a varietyof different ways

Most companies keep track of customer information by storing it in a database Thatdata may be customers employees products orders or anything that may assist thebusiness in their operations

19 Meaning of DataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data is factual information such as measurements or statistics about objects andconcepts We use it for discussion or calculation It could be a person a place anevent an action any one of a number of things A single fact is an element of data ora data element

If data is information and information is what we are all in the business of workingwith you can start to see where you may be storing it

bull Filing cabinets

bull Spreadsheets

bull Folders

bull Ledgers Lists

bull Piles of papers on your desk

They all store information and so too does a database Because of the mechanicalnature of databases they have terrific power to manage and process the information

3

they hold This can make the information they house much more useful for your workWith this understanding of data we can start to see how a tool with the capacity tostore a collection of data and organize it especially for rapid search retrieval andprocessing might make a difference to how we can use data This is all aboutmanaging information Source http cnxorgcontentm28150latest (httphttp20cnxorgcontentm28150latest)

Chapter 1 4

Chapter 2 Fundamental ConceptsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A Database is a shared collection of related data which is used to support the activitiesof a particular organization A database can be viewed as a repository of data that isdefined once and then is accessed by various users A database has the followingproperties

bull It is a representation of some aspect of the real world or perhaps a collection ofdata elements (facts) representing real-world information

bull A database is logical coherent and internally consistent

bull A database is designed built and populated with data for a specific purpose

bull Each data item is stored in a field

bull A combination of fields makes up a table For example an employee tablecontains fields that contain data about an individual employee

A Database Management System (DBMS) is a collection of programs that enable usersto create maintain databases and control all the access to the databases The primarygoal of the DBMS is to provide an environment that is both convenient and efficientfor users to retrieve and store information

Fig 21

httpscnxorgcontentm28150latest With the database approach we can have thetraditional banking system as shown in the following image

5

Fig 22

httpscnxorgcontentm28150latest

Chapter 2 6

Chapter 3 Characteristics andBenefits of a Database

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a number of characteristics that distinguish the database approach with thefile-based approach

31 Self-Describing Nature of a Database SystemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A Database System contains not only the database itself but also the descriptions ofdata structure and constraints (meta-data) This information is used by the DBMSsoftware or database users if needed This separation makes a database systemtotally different from the traditional file-based system in which the data definition is apart of application programs

32 Insulation between Program and DataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the file based system the structure of the data files is defined in the applicationprograms so if a user wants to change the structure of a file all the programs thataccess that file might need to be changed as well On the other hand in the databaseapproach the data structure is stored in the system catalog not in the programsTherefore one change is all thatrsquos needed

33 Support multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A view is a subset of the database which is defined and dedicated for particular usersof the system Multiple users in the system might have different views of the systemEach view might contain only the data of interest to a user or a group of users

34 Sharing of data and Multiuser systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A multiuser database system must allow multiple users access to the database at thesame time As a result the multiuser DBMS must have concurrency control strategies

7

to ensure several users access to the same data item at the same time and to do so ina manner that the data will always be correct ndash data integrity

35 Control Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the Database approach ideally each data item is stored in only one place in thedatabase In some cases redundancy still exists so as to improve system performancebut such redundancy is controlled and kept to minimum

36 Data SharingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The integration of the whole data in an organization leads to the ability to producemore information from a given amount of data

37 Enforcing Integrity ConstraintsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

DBMSs should provide capabilities to define and enforce certain constraints such asdata type data uniqueness etc

38 Restricting Unauthorised AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Not all users of the system have the same accessing privileges DBMSs should providea security subsystem to create and control the user accounts

39 Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

System data (Meta Data) descriptions are separated from the application programsChanges to the data structure is handled by the DBMS and not embedded in theprogram

Chapter 3 8

310 Transaction ProcessingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The DBMS must include concurrency control subsystems to ensure that several userstrying to update the same data do so in a controlled manner The results of anyupdates to the database must maintain consistency and validity

311 Providing multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A view may be a subset of the database Various users may have different views of thedatabase itself Users may not need to be aware of how and where the data they referto is stored

312 Providing backup and recovery facilitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If the computer system fails in the middle of a complex update process the recoverysubsystem is responsible for making sure that the database is restored to the stage itwas in before the process started executing

313 Managing informationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Managing information means taking care of it so that it works for us and is useful forthe work we are doing The information we collect is no longer subject to ldquoaccidentaldisorganizationrdquo and becomes more easily accessible and integrated with the rest ofour work Managing information using a database allows us to become strategic usersof the data we have

We often need to access and re-sort data for various uses These may include

bull Creating mailing lists

bull Writing management reports

bull Generating lists of selected news stories

bull Identifying various client needs

The processing power of a database allows it to manipulate the data it houses so itcan

9

bull Sortbull Matchbull Linkbull Aggregatebull Skip fieldsbull Calculatebull Arrange

So a database tracks data finding the bits you need and processing them in the wayyou arrange for them to be processed

bull Because of the versatility of databases we find them powering all sorts ofprojects A database can be linked to

bull A web site that is capturing registered users

bull A client tracking application for social service organisations

bull A medical record system for a health care facility

bull Your personal address book in your e-mail client

bull A collection of word processed documents

bull A system that issues airline reservations

Source httpcnxorgcontentm28150latest (httpcnxorgcontentm28150latest)

Chapter 3 10

Chapter 4 Types of Database Models

41 High-level Conceptual Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Provide concepts that are close to the way people perceive data to present the data Atypical example of this type is the entity relationship model which uses main conceptslike entities attributes relationships An entity represents a real-world object such asan employee a project An entity has some attributes which represents properties ofentity such as employeersquos name address birthdate A relationship represents anassociation among entities for example an employee works on many projects Therelationship exists between employee and project

411 Record-based Logical Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Provide concepts that can be understood by the user but not too far from the waydata is stored in the computer Three well-known data models of this type arerelational data model network data model and hierarchical data model

bull The Relational model represents data as relations or tables

bull The Network model represents data as record types and also represents a limitedtype of one to many relationship called set type

Fig 41 httpcnxorgcontentm28150latest

The Hierarchical model represents data as a hierarchical tree structures Eachhierarchy represents a number of related records Here is the schema in hierarchicalmodel notation

11

Fig 42 httpcnxorgcontentm28150latest

Chapter 4 12

Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to

bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)

bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )

bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)

The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database

51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction

13

The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS

52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Is the userrsquos view of the database

bull contains multiple different external views

bull is closely related to the real world as perceived by each user

53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull These models provide a flexible data structuring capabilities

bull Community view the logical structure of the entire database

bull Contains lsquoWhatrsquo data is stored bull

bull Shows relationships among data

bull constraints

bull semantic information (business rules ndash discussed later)

bull security amp integrity information

bull A database is considered as a collection of entities (objects) of variouskinds

bull Basis for identification and high-level description of main data objects avoidsdetails

bull They are database independent It does not matter what database you will beusing

54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Database is considered as a collection of fixed ndash size record

bull This model is closer to the physical level or file structure

bull Is a representation of database as seen by the DBMS

Chapter 5 14

bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model

bull The entities in the conceptual model are mapped to the tables in the relationalmodel

bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model

55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Physical representation of the database

bull Lowest level of abstractions bull

bull It is how the data is stored dealing with

bull Run-time performance

bull Storage utilization amp compression

bull File organization amp access methods

bull Data encryption

bull Physical level ndash managed by OS

bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory

551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model

The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother

The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)

As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created

15

At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel

Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design

The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data

Fig 51

552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database

bull External Schemas

bull multiple multiple subschemas = multiple external views of the data bull

bull Conceptual Schema one only

bull Data items relationships amp constraints ndash ER diagram

bull Physical Schema one only

bull Record definitions representation methods

bull Access methods indexes hashing

Chapter 5 16

553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser

Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data

The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical

Source httpcnxorgcontentm28150latest

554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs

In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)

Source httpcnxorgcontentm28150latest

555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy

Source httpcnxorgcontentm28150latest

17

Chapter 6 Classification of DatabaseSystems

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The database management systems can be classified based on several criteria

61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine

62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently

63 Based on the ways database is distributed

631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

With centralized database systems the system is stored at a single site

Fig 61 Source httpcnxorgcontentm28150latest

Chapter 6 18

632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Actual database and DBMS software are distributed in various sites connected by acomputer network

Fig 62 Source httpcnxorgcontentm28150latest

633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily

634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Different sites might use different DBMS software There is additional software tosupport data exchange between sites

Source httpcnxorgcontentm28150latest

19

Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model

The relational model has provided the basis for

bull Research on theory of datarelationshipconstraint

bull Numerous database design methodologies

bull The standard database access language SQL

bull Almost all modern commercial database management systems

The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo

71 Fundamental concepts in Relational Data Model

711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute

Source httpcnxorgcontentm28250latest

712 TableAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables

Chapter 7 20

Fig 71

713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation

The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another

Look at the following example of an ID card to see the relationship between fields andtheir data

714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example

21

bull The domain of Marital status has a set of possibilities Married Single Divorced

bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip

bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000

bull The domain of First Name is the set of character strings that represents names ofpeople

In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter

715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records

A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases

Fig 72

The example above shows us how fields can hold a range of different sorts of data ndashthis one has

bull An ID field ordinal number integer

bull A Pubdate field daymonthyear date

bull An author field Initial Surname text

bull A title field text bull

bull A body text field text

By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates

Chapter 7 22

Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way

72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The degree is the number of attributes In our example above the degree is 4

73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Table has a name that is distinct from all other tables in the database

bull There are no duplicate rows each row is distinct

bull Entries in columns are atomic (no repeating groups or multivalued attributes)

bull Entries from columns are from the same domain based on their data type

number (numeric integer float smallinthellip)

character (string)

date

logical (true or false)

bull Operations combining different data types are disallowed

bull Each attribute has a distinct name

bull sequence of columns is insignificant

bull sequence of rows is insignificant

731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We have used many different terms Here is a summary Alternative 1 is the mostcommon

23

Chapter 7 24

Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The Entity Relationship Data Model (ER) has existed for over 35 years

The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations

ER modelling is based on two concepts

bull Entities that is things Eg Prof Ba Course Database System

bull Relationships that is associations or interactions between entities

Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams

The example database

For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model

This database contains the information about employees departments and projects

bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department

bull A department controls a number of projects each of which has unique name aunique number and the budget

bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects

bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee

81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An entity is an object in the real world with an independent existence and can bedifferentiated from other objects

An entity might be

bull An object with physical existence Eg a lecturer a student a car

25

bull An object with conceptual existence Eg a course a job a position

An Entity Type defines a collection of similar entities

An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box

Fig 81 Source httpcnxorgcontentm28250latest

811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics

bull they are the lsquobuilding blocksrsquo of a database

bull the primary key may be simple or composite

bull the primary key is not a foreign key

bull they do not depend on another entity for their existence

Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning

bull Dependent entities are used to connect two kernels together

bull They are said to be existent dependent on two or more tables

bull Many to many relationships become associative tables with at least two foreignkeys

bull They may contain other attributes

bull The foreign key identifies each associated table

bull There are three options for the primary key

bull use a composite of foreign keys of associated tables if unique

bull use a composite of foreign keys and qualifying column

bull create a new simple primary key

Characteristic entities provide more information about another table These entitieshave the following characteristic

Chapter 8 26

bull They represent multi-valued attributes

bull They describe other entities

bull They typically have a one to many relationship

bull The foreign key is used to further identify the characterized table

bull Options for primary key are as follows

bull foreign key plus a qualifying column

bull or create a new simple primary key

bull Employee(EID Name Address Age Salary)

bull EmployeePhone(EID Phone)

812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)

Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram

In the diagram each attribute is represented by an oval with a name inside

Fig 82 Source httpcnxorgcontentm28250latest

813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model

Simple attributes are attributes that are drawn from the atomic value domains

eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes

27

eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo

Fig 83 Source httpcnxorgcontentm28250latest

Multivalued attributes Attributes that have a set of values for each entity

eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo

Fig 84 Source httpcnxorgcontentm28250latest

Derived attributes Attributes contain values that are calculated from other attributes

eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute

Fig 85 Source httpcnxorgcontentm28250latest

Chapter 8 28

814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set

8141 Candidate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull a simple or composite key that is unique and minimal

bull unique ndash no two rows in a table may have the same value at any time

bull minimal ndash every column must be necessary for uniqueness

bull For example for the entity

Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)

Possible candidate keys are EID SIN

FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment

EID SIN are also candidate keys

8142 Composite key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Composed of more than one attribute

bull For example as discussed earlier

First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department

bull A composite key can have two or more attributes but it must be minimal

29

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 9: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

they hold This can make the information they house much more useful for your workWith this understanding of data we can start to see how a tool with the capacity tostore a collection of data and organize it especially for rapid search retrieval andprocessing might make a difference to how we can use data This is all aboutmanaging information Source http cnxorgcontentm28150latest (httphttp20cnxorgcontentm28150latest)

Chapter 1 4

Chapter 2 Fundamental ConceptsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A Database is a shared collection of related data which is used to support the activitiesof a particular organization A database can be viewed as a repository of data that isdefined once and then is accessed by various users A database has the followingproperties

bull It is a representation of some aspect of the real world or perhaps a collection ofdata elements (facts) representing real-world information

bull A database is logical coherent and internally consistent

bull A database is designed built and populated with data for a specific purpose

bull Each data item is stored in a field

bull A combination of fields makes up a table For example an employee tablecontains fields that contain data about an individual employee

A Database Management System (DBMS) is a collection of programs that enable usersto create maintain databases and control all the access to the databases The primarygoal of the DBMS is to provide an environment that is both convenient and efficientfor users to retrieve and store information

Fig 21

httpscnxorgcontentm28150latest With the database approach we can have thetraditional banking system as shown in the following image

5

Fig 22

httpscnxorgcontentm28150latest

Chapter 2 6

Chapter 3 Characteristics andBenefits of a Database

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a number of characteristics that distinguish the database approach with thefile-based approach

31 Self-Describing Nature of a Database SystemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A Database System contains not only the database itself but also the descriptions ofdata structure and constraints (meta-data) This information is used by the DBMSsoftware or database users if needed This separation makes a database systemtotally different from the traditional file-based system in which the data definition is apart of application programs

32 Insulation between Program and DataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the file based system the structure of the data files is defined in the applicationprograms so if a user wants to change the structure of a file all the programs thataccess that file might need to be changed as well On the other hand in the databaseapproach the data structure is stored in the system catalog not in the programsTherefore one change is all thatrsquos needed

33 Support multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A view is a subset of the database which is defined and dedicated for particular usersof the system Multiple users in the system might have different views of the systemEach view might contain only the data of interest to a user or a group of users

34 Sharing of data and Multiuser systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A multiuser database system must allow multiple users access to the database at thesame time As a result the multiuser DBMS must have concurrency control strategies

7

to ensure several users access to the same data item at the same time and to do so ina manner that the data will always be correct ndash data integrity

35 Control Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the Database approach ideally each data item is stored in only one place in thedatabase In some cases redundancy still exists so as to improve system performancebut such redundancy is controlled and kept to minimum

36 Data SharingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The integration of the whole data in an organization leads to the ability to producemore information from a given amount of data

37 Enforcing Integrity ConstraintsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

DBMSs should provide capabilities to define and enforce certain constraints such asdata type data uniqueness etc

38 Restricting Unauthorised AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Not all users of the system have the same accessing privileges DBMSs should providea security subsystem to create and control the user accounts

39 Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

System data (Meta Data) descriptions are separated from the application programsChanges to the data structure is handled by the DBMS and not embedded in theprogram

Chapter 3 8

310 Transaction ProcessingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The DBMS must include concurrency control subsystems to ensure that several userstrying to update the same data do so in a controlled manner The results of anyupdates to the database must maintain consistency and validity

311 Providing multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A view may be a subset of the database Various users may have different views of thedatabase itself Users may not need to be aware of how and where the data they referto is stored

312 Providing backup and recovery facilitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If the computer system fails in the middle of a complex update process the recoverysubsystem is responsible for making sure that the database is restored to the stage itwas in before the process started executing

313 Managing informationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Managing information means taking care of it so that it works for us and is useful forthe work we are doing The information we collect is no longer subject to ldquoaccidentaldisorganizationrdquo and becomes more easily accessible and integrated with the rest ofour work Managing information using a database allows us to become strategic usersof the data we have

We often need to access and re-sort data for various uses These may include

bull Creating mailing lists

bull Writing management reports

bull Generating lists of selected news stories

bull Identifying various client needs

The processing power of a database allows it to manipulate the data it houses so itcan

9

bull Sortbull Matchbull Linkbull Aggregatebull Skip fieldsbull Calculatebull Arrange

So a database tracks data finding the bits you need and processing them in the wayyou arrange for them to be processed

bull Because of the versatility of databases we find them powering all sorts ofprojects A database can be linked to

bull A web site that is capturing registered users

bull A client tracking application for social service organisations

bull A medical record system for a health care facility

bull Your personal address book in your e-mail client

bull A collection of word processed documents

bull A system that issues airline reservations

Source httpcnxorgcontentm28150latest (httpcnxorgcontentm28150latest)

Chapter 3 10

Chapter 4 Types of Database Models

41 High-level Conceptual Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Provide concepts that are close to the way people perceive data to present the data Atypical example of this type is the entity relationship model which uses main conceptslike entities attributes relationships An entity represents a real-world object such asan employee a project An entity has some attributes which represents properties ofentity such as employeersquos name address birthdate A relationship represents anassociation among entities for example an employee works on many projects Therelationship exists between employee and project

411 Record-based Logical Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Provide concepts that can be understood by the user but not too far from the waydata is stored in the computer Three well-known data models of this type arerelational data model network data model and hierarchical data model

bull The Relational model represents data as relations or tables

bull The Network model represents data as record types and also represents a limitedtype of one to many relationship called set type

Fig 41 httpcnxorgcontentm28150latest

The Hierarchical model represents data as a hierarchical tree structures Eachhierarchy represents a number of related records Here is the schema in hierarchicalmodel notation

11

Fig 42 httpcnxorgcontentm28150latest

Chapter 4 12

Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to

bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)

bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )

bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)

The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database

51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction

13

The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS

52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Is the userrsquos view of the database

bull contains multiple different external views

bull is closely related to the real world as perceived by each user

53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull These models provide a flexible data structuring capabilities

bull Community view the logical structure of the entire database

bull Contains lsquoWhatrsquo data is stored bull

bull Shows relationships among data

bull constraints

bull semantic information (business rules ndash discussed later)

bull security amp integrity information

bull A database is considered as a collection of entities (objects) of variouskinds

bull Basis for identification and high-level description of main data objects avoidsdetails

bull They are database independent It does not matter what database you will beusing

54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Database is considered as a collection of fixed ndash size record

bull This model is closer to the physical level or file structure

bull Is a representation of database as seen by the DBMS

Chapter 5 14

bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model

bull The entities in the conceptual model are mapped to the tables in the relationalmodel

bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model

55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Physical representation of the database

bull Lowest level of abstractions bull

bull It is how the data is stored dealing with

bull Run-time performance

bull Storage utilization amp compression

bull File organization amp access methods

bull Data encryption

bull Physical level ndash managed by OS

bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory

551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model

The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother

The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)

As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created

15

At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel

Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design

The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data

Fig 51

552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database

bull External Schemas

bull multiple multiple subschemas = multiple external views of the data bull

bull Conceptual Schema one only

bull Data items relationships amp constraints ndash ER diagram

bull Physical Schema one only

bull Record definitions representation methods

bull Access methods indexes hashing

Chapter 5 16

553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser

Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data

The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical

Source httpcnxorgcontentm28150latest

554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs

In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)

Source httpcnxorgcontentm28150latest

555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy

Source httpcnxorgcontentm28150latest

17

Chapter 6 Classification of DatabaseSystems

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The database management systems can be classified based on several criteria

61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine

62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently

63 Based on the ways database is distributed

631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

With centralized database systems the system is stored at a single site

Fig 61 Source httpcnxorgcontentm28150latest

Chapter 6 18

632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Actual database and DBMS software are distributed in various sites connected by acomputer network

Fig 62 Source httpcnxorgcontentm28150latest

633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily

634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Different sites might use different DBMS software There is additional software tosupport data exchange between sites

Source httpcnxorgcontentm28150latest

19

Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model

The relational model has provided the basis for

bull Research on theory of datarelationshipconstraint

bull Numerous database design methodologies

bull The standard database access language SQL

bull Almost all modern commercial database management systems

The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo

71 Fundamental concepts in Relational Data Model

711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute

Source httpcnxorgcontentm28250latest

712 TableAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables

Chapter 7 20

Fig 71

713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation

The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another

Look at the following example of an ID card to see the relationship between fields andtheir data

714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example

21

bull The domain of Marital status has a set of possibilities Married Single Divorced

bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip

bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000

bull The domain of First Name is the set of character strings that represents names ofpeople

In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter

715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records

A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases

Fig 72

The example above shows us how fields can hold a range of different sorts of data ndashthis one has

bull An ID field ordinal number integer

bull A Pubdate field daymonthyear date

bull An author field Initial Surname text

bull A title field text bull

bull A body text field text

By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates

Chapter 7 22

Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way

72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The degree is the number of attributes In our example above the degree is 4

73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Table has a name that is distinct from all other tables in the database

bull There are no duplicate rows each row is distinct

bull Entries in columns are atomic (no repeating groups or multivalued attributes)

bull Entries from columns are from the same domain based on their data type

number (numeric integer float smallinthellip)

character (string)

date

logical (true or false)

bull Operations combining different data types are disallowed

bull Each attribute has a distinct name

bull sequence of columns is insignificant

bull sequence of rows is insignificant

731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We have used many different terms Here is a summary Alternative 1 is the mostcommon

23

Chapter 7 24

Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The Entity Relationship Data Model (ER) has existed for over 35 years

The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations

ER modelling is based on two concepts

bull Entities that is things Eg Prof Ba Course Database System

bull Relationships that is associations or interactions between entities

Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams

The example database

For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model

This database contains the information about employees departments and projects

bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department

bull A department controls a number of projects each of which has unique name aunique number and the budget

bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects

bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee

81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An entity is an object in the real world with an independent existence and can bedifferentiated from other objects

An entity might be

bull An object with physical existence Eg a lecturer a student a car

25

bull An object with conceptual existence Eg a course a job a position

An Entity Type defines a collection of similar entities

An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box

Fig 81 Source httpcnxorgcontentm28250latest

811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics

bull they are the lsquobuilding blocksrsquo of a database

bull the primary key may be simple or composite

bull the primary key is not a foreign key

bull they do not depend on another entity for their existence

Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning

bull Dependent entities are used to connect two kernels together

bull They are said to be existent dependent on two or more tables

bull Many to many relationships become associative tables with at least two foreignkeys

bull They may contain other attributes

bull The foreign key identifies each associated table

bull There are three options for the primary key

bull use a composite of foreign keys of associated tables if unique

bull use a composite of foreign keys and qualifying column

bull create a new simple primary key

Characteristic entities provide more information about another table These entitieshave the following characteristic

Chapter 8 26

bull They represent multi-valued attributes

bull They describe other entities

bull They typically have a one to many relationship

bull The foreign key is used to further identify the characterized table

bull Options for primary key are as follows

bull foreign key plus a qualifying column

bull or create a new simple primary key

bull Employee(EID Name Address Age Salary)

bull EmployeePhone(EID Phone)

812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)

Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram

In the diagram each attribute is represented by an oval with a name inside

Fig 82 Source httpcnxorgcontentm28250latest

813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model

Simple attributes are attributes that are drawn from the atomic value domains

eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes

27

eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo

Fig 83 Source httpcnxorgcontentm28250latest

Multivalued attributes Attributes that have a set of values for each entity

eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo

Fig 84 Source httpcnxorgcontentm28250latest

Derived attributes Attributes contain values that are calculated from other attributes

eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute

Fig 85 Source httpcnxorgcontentm28250latest

Chapter 8 28

814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set

8141 Candidate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull a simple or composite key that is unique and minimal

bull unique ndash no two rows in a table may have the same value at any time

bull minimal ndash every column must be necessary for uniqueness

bull For example for the entity

Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)

Possible candidate keys are EID SIN

FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment

EID SIN are also candidate keys

8142 Composite key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Composed of more than one attribute

bull For example as discussed earlier

First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department

bull A composite key can have two or more attributes but it must be minimal

29

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 10: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

Chapter 2 Fundamental ConceptsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A Database is a shared collection of related data which is used to support the activitiesof a particular organization A database can be viewed as a repository of data that isdefined once and then is accessed by various users A database has the followingproperties

bull It is a representation of some aspect of the real world or perhaps a collection ofdata elements (facts) representing real-world information

bull A database is logical coherent and internally consistent

bull A database is designed built and populated with data for a specific purpose

bull Each data item is stored in a field

bull A combination of fields makes up a table For example an employee tablecontains fields that contain data about an individual employee

A Database Management System (DBMS) is a collection of programs that enable usersto create maintain databases and control all the access to the databases The primarygoal of the DBMS is to provide an environment that is both convenient and efficientfor users to retrieve and store information

Fig 21

httpscnxorgcontentm28150latest With the database approach we can have thetraditional banking system as shown in the following image

5

Fig 22

httpscnxorgcontentm28150latest

Chapter 2 6

Chapter 3 Characteristics andBenefits of a Database

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a number of characteristics that distinguish the database approach with thefile-based approach

31 Self-Describing Nature of a Database SystemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A Database System contains not only the database itself but also the descriptions ofdata structure and constraints (meta-data) This information is used by the DBMSsoftware or database users if needed This separation makes a database systemtotally different from the traditional file-based system in which the data definition is apart of application programs

32 Insulation between Program and DataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the file based system the structure of the data files is defined in the applicationprograms so if a user wants to change the structure of a file all the programs thataccess that file might need to be changed as well On the other hand in the databaseapproach the data structure is stored in the system catalog not in the programsTherefore one change is all thatrsquos needed

33 Support multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A view is a subset of the database which is defined and dedicated for particular usersof the system Multiple users in the system might have different views of the systemEach view might contain only the data of interest to a user or a group of users

34 Sharing of data and Multiuser systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A multiuser database system must allow multiple users access to the database at thesame time As a result the multiuser DBMS must have concurrency control strategies

7

to ensure several users access to the same data item at the same time and to do so ina manner that the data will always be correct ndash data integrity

35 Control Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the Database approach ideally each data item is stored in only one place in thedatabase In some cases redundancy still exists so as to improve system performancebut such redundancy is controlled and kept to minimum

36 Data SharingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The integration of the whole data in an organization leads to the ability to producemore information from a given amount of data

37 Enforcing Integrity ConstraintsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

DBMSs should provide capabilities to define and enforce certain constraints such asdata type data uniqueness etc

38 Restricting Unauthorised AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Not all users of the system have the same accessing privileges DBMSs should providea security subsystem to create and control the user accounts

39 Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

System data (Meta Data) descriptions are separated from the application programsChanges to the data structure is handled by the DBMS and not embedded in theprogram

Chapter 3 8

310 Transaction ProcessingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The DBMS must include concurrency control subsystems to ensure that several userstrying to update the same data do so in a controlled manner The results of anyupdates to the database must maintain consistency and validity

311 Providing multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A view may be a subset of the database Various users may have different views of thedatabase itself Users may not need to be aware of how and where the data they referto is stored

312 Providing backup and recovery facilitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If the computer system fails in the middle of a complex update process the recoverysubsystem is responsible for making sure that the database is restored to the stage itwas in before the process started executing

313 Managing informationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Managing information means taking care of it so that it works for us and is useful forthe work we are doing The information we collect is no longer subject to ldquoaccidentaldisorganizationrdquo and becomes more easily accessible and integrated with the rest ofour work Managing information using a database allows us to become strategic usersof the data we have

We often need to access and re-sort data for various uses These may include

bull Creating mailing lists

bull Writing management reports

bull Generating lists of selected news stories

bull Identifying various client needs

The processing power of a database allows it to manipulate the data it houses so itcan

9

bull Sortbull Matchbull Linkbull Aggregatebull Skip fieldsbull Calculatebull Arrange

So a database tracks data finding the bits you need and processing them in the wayyou arrange for them to be processed

bull Because of the versatility of databases we find them powering all sorts ofprojects A database can be linked to

bull A web site that is capturing registered users

bull A client tracking application for social service organisations

bull A medical record system for a health care facility

bull Your personal address book in your e-mail client

bull A collection of word processed documents

bull A system that issues airline reservations

Source httpcnxorgcontentm28150latest (httpcnxorgcontentm28150latest)

Chapter 3 10

Chapter 4 Types of Database Models

41 High-level Conceptual Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Provide concepts that are close to the way people perceive data to present the data Atypical example of this type is the entity relationship model which uses main conceptslike entities attributes relationships An entity represents a real-world object such asan employee a project An entity has some attributes which represents properties ofentity such as employeersquos name address birthdate A relationship represents anassociation among entities for example an employee works on many projects Therelationship exists between employee and project

411 Record-based Logical Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Provide concepts that can be understood by the user but not too far from the waydata is stored in the computer Three well-known data models of this type arerelational data model network data model and hierarchical data model

bull The Relational model represents data as relations or tables

bull The Network model represents data as record types and also represents a limitedtype of one to many relationship called set type

Fig 41 httpcnxorgcontentm28150latest

The Hierarchical model represents data as a hierarchical tree structures Eachhierarchy represents a number of related records Here is the schema in hierarchicalmodel notation

11

Fig 42 httpcnxorgcontentm28150latest

Chapter 4 12

Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to

bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)

bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )

bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)

The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database

51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction

13

The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS

52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Is the userrsquos view of the database

bull contains multiple different external views

bull is closely related to the real world as perceived by each user

53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull These models provide a flexible data structuring capabilities

bull Community view the logical structure of the entire database

bull Contains lsquoWhatrsquo data is stored bull

bull Shows relationships among data

bull constraints

bull semantic information (business rules ndash discussed later)

bull security amp integrity information

bull A database is considered as a collection of entities (objects) of variouskinds

bull Basis for identification and high-level description of main data objects avoidsdetails

bull They are database independent It does not matter what database you will beusing

54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Database is considered as a collection of fixed ndash size record

bull This model is closer to the physical level or file structure

bull Is a representation of database as seen by the DBMS

Chapter 5 14

bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model

bull The entities in the conceptual model are mapped to the tables in the relationalmodel

bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model

55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Physical representation of the database

bull Lowest level of abstractions bull

bull It is how the data is stored dealing with

bull Run-time performance

bull Storage utilization amp compression

bull File organization amp access methods

bull Data encryption

bull Physical level ndash managed by OS

bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory

551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model

The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother

The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)

As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created

15

At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel

Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design

The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data

Fig 51

552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database

bull External Schemas

bull multiple multiple subschemas = multiple external views of the data bull

bull Conceptual Schema one only

bull Data items relationships amp constraints ndash ER diagram

bull Physical Schema one only

bull Record definitions representation methods

bull Access methods indexes hashing

Chapter 5 16

553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser

Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data

The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical

Source httpcnxorgcontentm28150latest

554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs

In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)

Source httpcnxorgcontentm28150latest

555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy

Source httpcnxorgcontentm28150latest

17

Chapter 6 Classification of DatabaseSystems

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The database management systems can be classified based on several criteria

61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine

62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently

63 Based on the ways database is distributed

631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

With centralized database systems the system is stored at a single site

Fig 61 Source httpcnxorgcontentm28150latest

Chapter 6 18

632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Actual database and DBMS software are distributed in various sites connected by acomputer network

Fig 62 Source httpcnxorgcontentm28150latest

633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily

634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Different sites might use different DBMS software There is additional software tosupport data exchange between sites

Source httpcnxorgcontentm28150latest

19

Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model

The relational model has provided the basis for

bull Research on theory of datarelationshipconstraint

bull Numerous database design methodologies

bull The standard database access language SQL

bull Almost all modern commercial database management systems

The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo

71 Fundamental concepts in Relational Data Model

711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute

Source httpcnxorgcontentm28250latest

712 TableAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables

Chapter 7 20

Fig 71

713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation

The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another

Look at the following example of an ID card to see the relationship between fields andtheir data

714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example

21

bull The domain of Marital status has a set of possibilities Married Single Divorced

bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip

bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000

bull The domain of First Name is the set of character strings that represents names ofpeople

In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter

715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records

A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases

Fig 72

The example above shows us how fields can hold a range of different sorts of data ndashthis one has

bull An ID field ordinal number integer

bull A Pubdate field daymonthyear date

bull An author field Initial Surname text

bull A title field text bull

bull A body text field text

By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates

Chapter 7 22

Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way

72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The degree is the number of attributes In our example above the degree is 4

73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Table has a name that is distinct from all other tables in the database

bull There are no duplicate rows each row is distinct

bull Entries in columns are atomic (no repeating groups or multivalued attributes)

bull Entries from columns are from the same domain based on their data type

number (numeric integer float smallinthellip)

character (string)

date

logical (true or false)

bull Operations combining different data types are disallowed

bull Each attribute has a distinct name

bull sequence of columns is insignificant

bull sequence of rows is insignificant

731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We have used many different terms Here is a summary Alternative 1 is the mostcommon

23

Chapter 7 24

Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The Entity Relationship Data Model (ER) has existed for over 35 years

The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations

ER modelling is based on two concepts

bull Entities that is things Eg Prof Ba Course Database System

bull Relationships that is associations or interactions between entities

Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams

The example database

For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model

This database contains the information about employees departments and projects

bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department

bull A department controls a number of projects each of which has unique name aunique number and the budget

bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects

bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee

81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An entity is an object in the real world with an independent existence and can bedifferentiated from other objects

An entity might be

bull An object with physical existence Eg a lecturer a student a car

25

bull An object with conceptual existence Eg a course a job a position

An Entity Type defines a collection of similar entities

An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box

Fig 81 Source httpcnxorgcontentm28250latest

811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics

bull they are the lsquobuilding blocksrsquo of a database

bull the primary key may be simple or composite

bull the primary key is not a foreign key

bull they do not depend on another entity for their existence

Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning

bull Dependent entities are used to connect two kernels together

bull They are said to be existent dependent on two or more tables

bull Many to many relationships become associative tables with at least two foreignkeys

bull They may contain other attributes

bull The foreign key identifies each associated table

bull There are three options for the primary key

bull use a composite of foreign keys of associated tables if unique

bull use a composite of foreign keys and qualifying column

bull create a new simple primary key

Characteristic entities provide more information about another table These entitieshave the following characteristic

Chapter 8 26

bull They represent multi-valued attributes

bull They describe other entities

bull They typically have a one to many relationship

bull The foreign key is used to further identify the characterized table

bull Options for primary key are as follows

bull foreign key plus a qualifying column

bull or create a new simple primary key

bull Employee(EID Name Address Age Salary)

bull EmployeePhone(EID Phone)

812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)

Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram

In the diagram each attribute is represented by an oval with a name inside

Fig 82 Source httpcnxorgcontentm28250latest

813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model

Simple attributes are attributes that are drawn from the atomic value domains

eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes

27

eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo

Fig 83 Source httpcnxorgcontentm28250latest

Multivalued attributes Attributes that have a set of values for each entity

eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo

Fig 84 Source httpcnxorgcontentm28250latest

Derived attributes Attributes contain values that are calculated from other attributes

eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute

Fig 85 Source httpcnxorgcontentm28250latest

Chapter 8 28

814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set

8141 Candidate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull a simple or composite key that is unique and minimal

bull unique ndash no two rows in a table may have the same value at any time

bull minimal ndash every column must be necessary for uniqueness

bull For example for the entity

Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)

Possible candidate keys are EID SIN

FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment

EID SIN are also candidate keys

8142 Composite key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Composed of more than one attribute

bull For example as discussed earlier

First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department

bull A composite key can have two or more attributes but it must be minimal

29

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 11: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

Fig 22

httpscnxorgcontentm28150latest

Chapter 2 6

Chapter 3 Characteristics andBenefits of a Database

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a number of characteristics that distinguish the database approach with thefile-based approach

31 Self-Describing Nature of a Database SystemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A Database System contains not only the database itself but also the descriptions ofdata structure and constraints (meta-data) This information is used by the DBMSsoftware or database users if needed This separation makes a database systemtotally different from the traditional file-based system in which the data definition is apart of application programs

32 Insulation between Program and DataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the file based system the structure of the data files is defined in the applicationprograms so if a user wants to change the structure of a file all the programs thataccess that file might need to be changed as well On the other hand in the databaseapproach the data structure is stored in the system catalog not in the programsTherefore one change is all thatrsquos needed

33 Support multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A view is a subset of the database which is defined and dedicated for particular usersof the system Multiple users in the system might have different views of the systemEach view might contain only the data of interest to a user or a group of users

34 Sharing of data and Multiuser systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A multiuser database system must allow multiple users access to the database at thesame time As a result the multiuser DBMS must have concurrency control strategies

7

to ensure several users access to the same data item at the same time and to do so ina manner that the data will always be correct ndash data integrity

35 Control Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the Database approach ideally each data item is stored in only one place in thedatabase In some cases redundancy still exists so as to improve system performancebut such redundancy is controlled and kept to minimum

36 Data SharingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The integration of the whole data in an organization leads to the ability to producemore information from a given amount of data

37 Enforcing Integrity ConstraintsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

DBMSs should provide capabilities to define and enforce certain constraints such asdata type data uniqueness etc

38 Restricting Unauthorised AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Not all users of the system have the same accessing privileges DBMSs should providea security subsystem to create and control the user accounts

39 Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

System data (Meta Data) descriptions are separated from the application programsChanges to the data structure is handled by the DBMS and not embedded in theprogram

Chapter 3 8

310 Transaction ProcessingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The DBMS must include concurrency control subsystems to ensure that several userstrying to update the same data do so in a controlled manner The results of anyupdates to the database must maintain consistency and validity

311 Providing multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A view may be a subset of the database Various users may have different views of thedatabase itself Users may not need to be aware of how and where the data they referto is stored

312 Providing backup and recovery facilitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If the computer system fails in the middle of a complex update process the recoverysubsystem is responsible for making sure that the database is restored to the stage itwas in before the process started executing

313 Managing informationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Managing information means taking care of it so that it works for us and is useful forthe work we are doing The information we collect is no longer subject to ldquoaccidentaldisorganizationrdquo and becomes more easily accessible and integrated with the rest ofour work Managing information using a database allows us to become strategic usersof the data we have

We often need to access and re-sort data for various uses These may include

bull Creating mailing lists

bull Writing management reports

bull Generating lists of selected news stories

bull Identifying various client needs

The processing power of a database allows it to manipulate the data it houses so itcan

9

bull Sortbull Matchbull Linkbull Aggregatebull Skip fieldsbull Calculatebull Arrange

So a database tracks data finding the bits you need and processing them in the wayyou arrange for them to be processed

bull Because of the versatility of databases we find them powering all sorts ofprojects A database can be linked to

bull A web site that is capturing registered users

bull A client tracking application for social service organisations

bull A medical record system for a health care facility

bull Your personal address book in your e-mail client

bull A collection of word processed documents

bull A system that issues airline reservations

Source httpcnxorgcontentm28150latest (httpcnxorgcontentm28150latest)

Chapter 3 10

Chapter 4 Types of Database Models

41 High-level Conceptual Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Provide concepts that are close to the way people perceive data to present the data Atypical example of this type is the entity relationship model which uses main conceptslike entities attributes relationships An entity represents a real-world object such asan employee a project An entity has some attributes which represents properties ofentity such as employeersquos name address birthdate A relationship represents anassociation among entities for example an employee works on many projects Therelationship exists between employee and project

411 Record-based Logical Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Provide concepts that can be understood by the user but not too far from the waydata is stored in the computer Three well-known data models of this type arerelational data model network data model and hierarchical data model

bull The Relational model represents data as relations or tables

bull The Network model represents data as record types and also represents a limitedtype of one to many relationship called set type

Fig 41 httpcnxorgcontentm28150latest

The Hierarchical model represents data as a hierarchical tree structures Eachhierarchy represents a number of related records Here is the schema in hierarchicalmodel notation

11

Fig 42 httpcnxorgcontentm28150latest

Chapter 4 12

Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to

bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)

bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )

bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)

The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database

51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction

13

The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS

52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Is the userrsquos view of the database

bull contains multiple different external views

bull is closely related to the real world as perceived by each user

53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull These models provide a flexible data structuring capabilities

bull Community view the logical structure of the entire database

bull Contains lsquoWhatrsquo data is stored bull

bull Shows relationships among data

bull constraints

bull semantic information (business rules ndash discussed later)

bull security amp integrity information

bull A database is considered as a collection of entities (objects) of variouskinds

bull Basis for identification and high-level description of main data objects avoidsdetails

bull They are database independent It does not matter what database you will beusing

54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Database is considered as a collection of fixed ndash size record

bull This model is closer to the physical level or file structure

bull Is a representation of database as seen by the DBMS

Chapter 5 14

bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model

bull The entities in the conceptual model are mapped to the tables in the relationalmodel

bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model

55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Physical representation of the database

bull Lowest level of abstractions bull

bull It is how the data is stored dealing with

bull Run-time performance

bull Storage utilization amp compression

bull File organization amp access methods

bull Data encryption

bull Physical level ndash managed by OS

bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory

551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model

The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother

The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)

As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created

15

At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel

Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design

The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data

Fig 51

552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database

bull External Schemas

bull multiple multiple subschemas = multiple external views of the data bull

bull Conceptual Schema one only

bull Data items relationships amp constraints ndash ER diagram

bull Physical Schema one only

bull Record definitions representation methods

bull Access methods indexes hashing

Chapter 5 16

553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser

Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data

The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical

Source httpcnxorgcontentm28150latest

554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs

In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)

Source httpcnxorgcontentm28150latest

555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy

Source httpcnxorgcontentm28150latest

17

Chapter 6 Classification of DatabaseSystems

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The database management systems can be classified based on several criteria

61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine

62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently

63 Based on the ways database is distributed

631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

With centralized database systems the system is stored at a single site

Fig 61 Source httpcnxorgcontentm28150latest

Chapter 6 18

632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Actual database and DBMS software are distributed in various sites connected by acomputer network

Fig 62 Source httpcnxorgcontentm28150latest

633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily

634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Different sites might use different DBMS software There is additional software tosupport data exchange between sites

Source httpcnxorgcontentm28150latest

19

Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model

The relational model has provided the basis for

bull Research on theory of datarelationshipconstraint

bull Numerous database design methodologies

bull The standard database access language SQL

bull Almost all modern commercial database management systems

The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo

71 Fundamental concepts in Relational Data Model

711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute

Source httpcnxorgcontentm28250latest

712 TableAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables

Chapter 7 20

Fig 71

713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation

The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another

Look at the following example of an ID card to see the relationship between fields andtheir data

714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example

21

bull The domain of Marital status has a set of possibilities Married Single Divorced

bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip

bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000

bull The domain of First Name is the set of character strings that represents names ofpeople

In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter

715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records

A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases

Fig 72

The example above shows us how fields can hold a range of different sorts of data ndashthis one has

bull An ID field ordinal number integer

bull A Pubdate field daymonthyear date

bull An author field Initial Surname text

bull A title field text bull

bull A body text field text

By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates

Chapter 7 22

Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way

72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The degree is the number of attributes In our example above the degree is 4

73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Table has a name that is distinct from all other tables in the database

bull There are no duplicate rows each row is distinct

bull Entries in columns are atomic (no repeating groups or multivalued attributes)

bull Entries from columns are from the same domain based on their data type

number (numeric integer float smallinthellip)

character (string)

date

logical (true or false)

bull Operations combining different data types are disallowed

bull Each attribute has a distinct name

bull sequence of columns is insignificant

bull sequence of rows is insignificant

731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We have used many different terms Here is a summary Alternative 1 is the mostcommon

23

Chapter 7 24

Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The Entity Relationship Data Model (ER) has existed for over 35 years

The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations

ER modelling is based on two concepts

bull Entities that is things Eg Prof Ba Course Database System

bull Relationships that is associations or interactions between entities

Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams

The example database

For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model

This database contains the information about employees departments and projects

bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department

bull A department controls a number of projects each of which has unique name aunique number and the budget

bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects

bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee

81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An entity is an object in the real world with an independent existence and can bedifferentiated from other objects

An entity might be

bull An object with physical existence Eg a lecturer a student a car

25

bull An object with conceptual existence Eg a course a job a position

An Entity Type defines a collection of similar entities

An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box

Fig 81 Source httpcnxorgcontentm28250latest

811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics

bull they are the lsquobuilding blocksrsquo of a database

bull the primary key may be simple or composite

bull the primary key is not a foreign key

bull they do not depend on another entity for their existence

Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning

bull Dependent entities are used to connect two kernels together

bull They are said to be existent dependent on two or more tables

bull Many to many relationships become associative tables with at least two foreignkeys

bull They may contain other attributes

bull The foreign key identifies each associated table

bull There are three options for the primary key

bull use a composite of foreign keys of associated tables if unique

bull use a composite of foreign keys and qualifying column

bull create a new simple primary key

Characteristic entities provide more information about another table These entitieshave the following characteristic

Chapter 8 26

bull They represent multi-valued attributes

bull They describe other entities

bull They typically have a one to many relationship

bull The foreign key is used to further identify the characterized table

bull Options for primary key are as follows

bull foreign key plus a qualifying column

bull or create a new simple primary key

bull Employee(EID Name Address Age Salary)

bull EmployeePhone(EID Phone)

812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)

Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram

In the diagram each attribute is represented by an oval with a name inside

Fig 82 Source httpcnxorgcontentm28250latest

813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model

Simple attributes are attributes that are drawn from the atomic value domains

eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes

27

eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo

Fig 83 Source httpcnxorgcontentm28250latest

Multivalued attributes Attributes that have a set of values for each entity

eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo

Fig 84 Source httpcnxorgcontentm28250latest

Derived attributes Attributes contain values that are calculated from other attributes

eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute

Fig 85 Source httpcnxorgcontentm28250latest

Chapter 8 28

814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set

8141 Candidate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull a simple or composite key that is unique and minimal

bull unique ndash no two rows in a table may have the same value at any time

bull minimal ndash every column must be necessary for uniqueness

bull For example for the entity

Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)

Possible candidate keys are EID SIN

FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment

EID SIN are also candidate keys

8142 Composite key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Composed of more than one attribute

bull For example as discussed earlier

First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department

bull A composite key can have two or more attributes but it must be minimal

29

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 12: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

Chapter 3 Characteristics andBenefits of a Database

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a number of characteristics that distinguish the database approach with thefile-based approach

31 Self-Describing Nature of a Database SystemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A Database System contains not only the database itself but also the descriptions ofdata structure and constraints (meta-data) This information is used by the DBMSsoftware or database users if needed This separation makes a database systemtotally different from the traditional file-based system in which the data definition is apart of application programs

32 Insulation between Program and DataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the file based system the structure of the data files is defined in the applicationprograms so if a user wants to change the structure of a file all the programs thataccess that file might need to be changed as well On the other hand in the databaseapproach the data structure is stored in the system catalog not in the programsTherefore one change is all thatrsquos needed

33 Support multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A view is a subset of the database which is defined and dedicated for particular usersof the system Multiple users in the system might have different views of the systemEach view might contain only the data of interest to a user or a group of users

34 Sharing of data and Multiuser systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A multiuser database system must allow multiple users access to the database at thesame time As a result the multiuser DBMS must have concurrency control strategies

7

to ensure several users access to the same data item at the same time and to do so ina manner that the data will always be correct ndash data integrity

35 Control Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the Database approach ideally each data item is stored in only one place in thedatabase In some cases redundancy still exists so as to improve system performancebut such redundancy is controlled and kept to minimum

36 Data SharingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The integration of the whole data in an organization leads to the ability to producemore information from a given amount of data

37 Enforcing Integrity ConstraintsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

DBMSs should provide capabilities to define and enforce certain constraints such asdata type data uniqueness etc

38 Restricting Unauthorised AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Not all users of the system have the same accessing privileges DBMSs should providea security subsystem to create and control the user accounts

39 Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

System data (Meta Data) descriptions are separated from the application programsChanges to the data structure is handled by the DBMS and not embedded in theprogram

Chapter 3 8

310 Transaction ProcessingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The DBMS must include concurrency control subsystems to ensure that several userstrying to update the same data do so in a controlled manner The results of anyupdates to the database must maintain consistency and validity

311 Providing multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A view may be a subset of the database Various users may have different views of thedatabase itself Users may not need to be aware of how and where the data they referto is stored

312 Providing backup and recovery facilitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If the computer system fails in the middle of a complex update process the recoverysubsystem is responsible for making sure that the database is restored to the stage itwas in before the process started executing

313 Managing informationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Managing information means taking care of it so that it works for us and is useful forthe work we are doing The information we collect is no longer subject to ldquoaccidentaldisorganizationrdquo and becomes more easily accessible and integrated with the rest ofour work Managing information using a database allows us to become strategic usersof the data we have

We often need to access and re-sort data for various uses These may include

bull Creating mailing lists

bull Writing management reports

bull Generating lists of selected news stories

bull Identifying various client needs

The processing power of a database allows it to manipulate the data it houses so itcan

9

bull Sortbull Matchbull Linkbull Aggregatebull Skip fieldsbull Calculatebull Arrange

So a database tracks data finding the bits you need and processing them in the wayyou arrange for them to be processed

bull Because of the versatility of databases we find them powering all sorts ofprojects A database can be linked to

bull A web site that is capturing registered users

bull A client tracking application for social service organisations

bull A medical record system for a health care facility

bull Your personal address book in your e-mail client

bull A collection of word processed documents

bull A system that issues airline reservations

Source httpcnxorgcontentm28150latest (httpcnxorgcontentm28150latest)

Chapter 3 10

Chapter 4 Types of Database Models

41 High-level Conceptual Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Provide concepts that are close to the way people perceive data to present the data Atypical example of this type is the entity relationship model which uses main conceptslike entities attributes relationships An entity represents a real-world object such asan employee a project An entity has some attributes which represents properties ofentity such as employeersquos name address birthdate A relationship represents anassociation among entities for example an employee works on many projects Therelationship exists between employee and project

411 Record-based Logical Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Provide concepts that can be understood by the user but not too far from the waydata is stored in the computer Three well-known data models of this type arerelational data model network data model and hierarchical data model

bull The Relational model represents data as relations or tables

bull The Network model represents data as record types and also represents a limitedtype of one to many relationship called set type

Fig 41 httpcnxorgcontentm28150latest

The Hierarchical model represents data as a hierarchical tree structures Eachhierarchy represents a number of related records Here is the schema in hierarchicalmodel notation

11

Fig 42 httpcnxorgcontentm28150latest

Chapter 4 12

Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to

bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)

bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )

bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)

The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database

51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction

13

The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS

52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Is the userrsquos view of the database

bull contains multiple different external views

bull is closely related to the real world as perceived by each user

53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull These models provide a flexible data structuring capabilities

bull Community view the logical structure of the entire database

bull Contains lsquoWhatrsquo data is stored bull

bull Shows relationships among data

bull constraints

bull semantic information (business rules ndash discussed later)

bull security amp integrity information

bull A database is considered as a collection of entities (objects) of variouskinds

bull Basis for identification and high-level description of main data objects avoidsdetails

bull They are database independent It does not matter what database you will beusing

54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Database is considered as a collection of fixed ndash size record

bull This model is closer to the physical level or file structure

bull Is a representation of database as seen by the DBMS

Chapter 5 14

bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model

bull The entities in the conceptual model are mapped to the tables in the relationalmodel

bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model

55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Physical representation of the database

bull Lowest level of abstractions bull

bull It is how the data is stored dealing with

bull Run-time performance

bull Storage utilization amp compression

bull File organization amp access methods

bull Data encryption

bull Physical level ndash managed by OS

bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory

551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model

The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother

The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)

As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created

15

At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel

Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design

The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data

Fig 51

552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database

bull External Schemas

bull multiple multiple subschemas = multiple external views of the data bull

bull Conceptual Schema one only

bull Data items relationships amp constraints ndash ER diagram

bull Physical Schema one only

bull Record definitions representation methods

bull Access methods indexes hashing

Chapter 5 16

553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser

Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data

The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical

Source httpcnxorgcontentm28150latest

554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs

In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)

Source httpcnxorgcontentm28150latest

555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy

Source httpcnxorgcontentm28150latest

17

Chapter 6 Classification of DatabaseSystems

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The database management systems can be classified based on several criteria

61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine

62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently

63 Based on the ways database is distributed

631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

With centralized database systems the system is stored at a single site

Fig 61 Source httpcnxorgcontentm28150latest

Chapter 6 18

632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Actual database and DBMS software are distributed in various sites connected by acomputer network

Fig 62 Source httpcnxorgcontentm28150latest

633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily

634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Different sites might use different DBMS software There is additional software tosupport data exchange between sites

Source httpcnxorgcontentm28150latest

19

Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model

The relational model has provided the basis for

bull Research on theory of datarelationshipconstraint

bull Numerous database design methodologies

bull The standard database access language SQL

bull Almost all modern commercial database management systems

The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo

71 Fundamental concepts in Relational Data Model

711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute

Source httpcnxorgcontentm28250latest

712 TableAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables

Chapter 7 20

Fig 71

713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation

The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another

Look at the following example of an ID card to see the relationship between fields andtheir data

714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example

21

bull The domain of Marital status has a set of possibilities Married Single Divorced

bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip

bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000

bull The domain of First Name is the set of character strings that represents names ofpeople

In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter

715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records

A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases

Fig 72

The example above shows us how fields can hold a range of different sorts of data ndashthis one has

bull An ID field ordinal number integer

bull A Pubdate field daymonthyear date

bull An author field Initial Surname text

bull A title field text bull

bull A body text field text

By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates

Chapter 7 22

Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way

72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The degree is the number of attributes In our example above the degree is 4

73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Table has a name that is distinct from all other tables in the database

bull There are no duplicate rows each row is distinct

bull Entries in columns are atomic (no repeating groups or multivalued attributes)

bull Entries from columns are from the same domain based on their data type

number (numeric integer float smallinthellip)

character (string)

date

logical (true or false)

bull Operations combining different data types are disallowed

bull Each attribute has a distinct name

bull sequence of columns is insignificant

bull sequence of rows is insignificant

731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We have used many different terms Here is a summary Alternative 1 is the mostcommon

23

Chapter 7 24

Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The Entity Relationship Data Model (ER) has existed for over 35 years

The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations

ER modelling is based on two concepts

bull Entities that is things Eg Prof Ba Course Database System

bull Relationships that is associations or interactions between entities

Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams

The example database

For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model

This database contains the information about employees departments and projects

bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department

bull A department controls a number of projects each of which has unique name aunique number and the budget

bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects

bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee

81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An entity is an object in the real world with an independent existence and can bedifferentiated from other objects

An entity might be

bull An object with physical existence Eg a lecturer a student a car

25

bull An object with conceptual existence Eg a course a job a position

An Entity Type defines a collection of similar entities

An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box

Fig 81 Source httpcnxorgcontentm28250latest

811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics

bull they are the lsquobuilding blocksrsquo of a database

bull the primary key may be simple or composite

bull the primary key is not a foreign key

bull they do not depend on another entity for their existence

Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning

bull Dependent entities are used to connect two kernels together

bull They are said to be existent dependent on two or more tables

bull Many to many relationships become associative tables with at least two foreignkeys

bull They may contain other attributes

bull The foreign key identifies each associated table

bull There are three options for the primary key

bull use a composite of foreign keys of associated tables if unique

bull use a composite of foreign keys and qualifying column

bull create a new simple primary key

Characteristic entities provide more information about another table These entitieshave the following characteristic

Chapter 8 26

bull They represent multi-valued attributes

bull They describe other entities

bull They typically have a one to many relationship

bull The foreign key is used to further identify the characterized table

bull Options for primary key are as follows

bull foreign key plus a qualifying column

bull or create a new simple primary key

bull Employee(EID Name Address Age Salary)

bull EmployeePhone(EID Phone)

812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)

Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram

In the diagram each attribute is represented by an oval with a name inside

Fig 82 Source httpcnxorgcontentm28250latest

813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model

Simple attributes are attributes that are drawn from the atomic value domains

eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes

27

eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo

Fig 83 Source httpcnxorgcontentm28250latest

Multivalued attributes Attributes that have a set of values for each entity

eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo

Fig 84 Source httpcnxorgcontentm28250latest

Derived attributes Attributes contain values that are calculated from other attributes

eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute

Fig 85 Source httpcnxorgcontentm28250latest

Chapter 8 28

814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set

8141 Candidate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull a simple or composite key that is unique and minimal

bull unique ndash no two rows in a table may have the same value at any time

bull minimal ndash every column must be necessary for uniqueness

bull For example for the entity

Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)

Possible candidate keys are EID SIN

FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment

EID SIN are also candidate keys

8142 Composite key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Composed of more than one attribute

bull For example as discussed earlier

First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department

bull A composite key can have two or more attributes but it must be minimal

29

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 13: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

to ensure several users access to the same data item at the same time and to do so ina manner that the data will always be correct ndash data integrity

35 Control Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the Database approach ideally each data item is stored in only one place in thedatabase In some cases redundancy still exists so as to improve system performancebut such redundancy is controlled and kept to minimum

36 Data SharingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The integration of the whole data in an organization leads to the ability to producemore information from a given amount of data

37 Enforcing Integrity ConstraintsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

DBMSs should provide capabilities to define and enforce certain constraints such asdata type data uniqueness etc

38 Restricting Unauthorised AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Not all users of the system have the same accessing privileges DBMSs should providea security subsystem to create and control the user accounts

39 Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

System data (Meta Data) descriptions are separated from the application programsChanges to the data structure is handled by the DBMS and not embedded in theprogram

Chapter 3 8

310 Transaction ProcessingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The DBMS must include concurrency control subsystems to ensure that several userstrying to update the same data do so in a controlled manner The results of anyupdates to the database must maintain consistency and validity

311 Providing multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A view may be a subset of the database Various users may have different views of thedatabase itself Users may not need to be aware of how and where the data they referto is stored

312 Providing backup and recovery facilitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If the computer system fails in the middle of a complex update process the recoverysubsystem is responsible for making sure that the database is restored to the stage itwas in before the process started executing

313 Managing informationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Managing information means taking care of it so that it works for us and is useful forthe work we are doing The information we collect is no longer subject to ldquoaccidentaldisorganizationrdquo and becomes more easily accessible and integrated with the rest ofour work Managing information using a database allows us to become strategic usersof the data we have

We often need to access and re-sort data for various uses These may include

bull Creating mailing lists

bull Writing management reports

bull Generating lists of selected news stories

bull Identifying various client needs

The processing power of a database allows it to manipulate the data it houses so itcan

9

bull Sortbull Matchbull Linkbull Aggregatebull Skip fieldsbull Calculatebull Arrange

So a database tracks data finding the bits you need and processing them in the wayyou arrange for them to be processed

bull Because of the versatility of databases we find them powering all sorts ofprojects A database can be linked to

bull A web site that is capturing registered users

bull A client tracking application for social service organisations

bull A medical record system for a health care facility

bull Your personal address book in your e-mail client

bull A collection of word processed documents

bull A system that issues airline reservations

Source httpcnxorgcontentm28150latest (httpcnxorgcontentm28150latest)

Chapter 3 10

Chapter 4 Types of Database Models

41 High-level Conceptual Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Provide concepts that are close to the way people perceive data to present the data Atypical example of this type is the entity relationship model which uses main conceptslike entities attributes relationships An entity represents a real-world object such asan employee a project An entity has some attributes which represents properties ofentity such as employeersquos name address birthdate A relationship represents anassociation among entities for example an employee works on many projects Therelationship exists between employee and project

411 Record-based Logical Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Provide concepts that can be understood by the user but not too far from the waydata is stored in the computer Three well-known data models of this type arerelational data model network data model and hierarchical data model

bull The Relational model represents data as relations or tables

bull The Network model represents data as record types and also represents a limitedtype of one to many relationship called set type

Fig 41 httpcnxorgcontentm28150latest

The Hierarchical model represents data as a hierarchical tree structures Eachhierarchy represents a number of related records Here is the schema in hierarchicalmodel notation

11

Fig 42 httpcnxorgcontentm28150latest

Chapter 4 12

Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to

bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)

bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )

bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)

The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database

51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction

13

The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS

52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Is the userrsquos view of the database

bull contains multiple different external views

bull is closely related to the real world as perceived by each user

53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull These models provide a flexible data structuring capabilities

bull Community view the logical structure of the entire database

bull Contains lsquoWhatrsquo data is stored bull

bull Shows relationships among data

bull constraints

bull semantic information (business rules ndash discussed later)

bull security amp integrity information

bull A database is considered as a collection of entities (objects) of variouskinds

bull Basis for identification and high-level description of main data objects avoidsdetails

bull They are database independent It does not matter what database you will beusing

54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Database is considered as a collection of fixed ndash size record

bull This model is closer to the physical level or file structure

bull Is a representation of database as seen by the DBMS

Chapter 5 14

bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model

bull The entities in the conceptual model are mapped to the tables in the relationalmodel

bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model

55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Physical representation of the database

bull Lowest level of abstractions bull

bull It is how the data is stored dealing with

bull Run-time performance

bull Storage utilization amp compression

bull File organization amp access methods

bull Data encryption

bull Physical level ndash managed by OS

bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory

551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model

The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother

The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)

As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created

15

At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel

Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design

The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data

Fig 51

552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database

bull External Schemas

bull multiple multiple subschemas = multiple external views of the data bull

bull Conceptual Schema one only

bull Data items relationships amp constraints ndash ER diagram

bull Physical Schema one only

bull Record definitions representation methods

bull Access methods indexes hashing

Chapter 5 16

553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser

Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data

The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical

Source httpcnxorgcontentm28150latest

554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs

In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)

Source httpcnxorgcontentm28150latest

555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy

Source httpcnxorgcontentm28150latest

17

Chapter 6 Classification of DatabaseSystems

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The database management systems can be classified based on several criteria

61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine

62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently

63 Based on the ways database is distributed

631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

With centralized database systems the system is stored at a single site

Fig 61 Source httpcnxorgcontentm28150latest

Chapter 6 18

632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Actual database and DBMS software are distributed in various sites connected by acomputer network

Fig 62 Source httpcnxorgcontentm28150latest

633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily

634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Different sites might use different DBMS software There is additional software tosupport data exchange between sites

Source httpcnxorgcontentm28150latest

19

Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model

The relational model has provided the basis for

bull Research on theory of datarelationshipconstraint

bull Numerous database design methodologies

bull The standard database access language SQL

bull Almost all modern commercial database management systems

The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo

71 Fundamental concepts in Relational Data Model

711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute

Source httpcnxorgcontentm28250latest

712 TableAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables

Chapter 7 20

Fig 71

713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation

The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another

Look at the following example of an ID card to see the relationship between fields andtheir data

714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example

21

bull The domain of Marital status has a set of possibilities Married Single Divorced

bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip

bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000

bull The domain of First Name is the set of character strings that represents names ofpeople

In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter

715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records

A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases

Fig 72

The example above shows us how fields can hold a range of different sorts of data ndashthis one has

bull An ID field ordinal number integer

bull A Pubdate field daymonthyear date

bull An author field Initial Surname text

bull A title field text bull

bull A body text field text

By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates

Chapter 7 22

Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way

72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The degree is the number of attributes In our example above the degree is 4

73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Table has a name that is distinct from all other tables in the database

bull There are no duplicate rows each row is distinct

bull Entries in columns are atomic (no repeating groups or multivalued attributes)

bull Entries from columns are from the same domain based on their data type

number (numeric integer float smallinthellip)

character (string)

date

logical (true or false)

bull Operations combining different data types are disallowed

bull Each attribute has a distinct name

bull sequence of columns is insignificant

bull sequence of rows is insignificant

731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We have used many different terms Here is a summary Alternative 1 is the mostcommon

23

Chapter 7 24

Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The Entity Relationship Data Model (ER) has existed for over 35 years

The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations

ER modelling is based on two concepts

bull Entities that is things Eg Prof Ba Course Database System

bull Relationships that is associations or interactions between entities

Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams

The example database

For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model

This database contains the information about employees departments and projects

bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department

bull A department controls a number of projects each of which has unique name aunique number and the budget

bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects

bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee

81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An entity is an object in the real world with an independent existence and can bedifferentiated from other objects

An entity might be

bull An object with physical existence Eg a lecturer a student a car

25

bull An object with conceptual existence Eg a course a job a position

An Entity Type defines a collection of similar entities

An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box

Fig 81 Source httpcnxorgcontentm28250latest

811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics

bull they are the lsquobuilding blocksrsquo of a database

bull the primary key may be simple or composite

bull the primary key is not a foreign key

bull they do not depend on another entity for their existence

Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning

bull Dependent entities are used to connect two kernels together

bull They are said to be existent dependent on two or more tables

bull Many to many relationships become associative tables with at least two foreignkeys

bull They may contain other attributes

bull The foreign key identifies each associated table

bull There are three options for the primary key

bull use a composite of foreign keys of associated tables if unique

bull use a composite of foreign keys and qualifying column

bull create a new simple primary key

Characteristic entities provide more information about another table These entitieshave the following characteristic

Chapter 8 26

bull They represent multi-valued attributes

bull They describe other entities

bull They typically have a one to many relationship

bull The foreign key is used to further identify the characterized table

bull Options for primary key are as follows

bull foreign key plus a qualifying column

bull or create a new simple primary key

bull Employee(EID Name Address Age Salary)

bull EmployeePhone(EID Phone)

812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)

Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram

In the diagram each attribute is represented by an oval with a name inside

Fig 82 Source httpcnxorgcontentm28250latest

813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model

Simple attributes are attributes that are drawn from the atomic value domains

eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes

27

eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo

Fig 83 Source httpcnxorgcontentm28250latest

Multivalued attributes Attributes that have a set of values for each entity

eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo

Fig 84 Source httpcnxorgcontentm28250latest

Derived attributes Attributes contain values that are calculated from other attributes

eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute

Fig 85 Source httpcnxorgcontentm28250latest

Chapter 8 28

814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set

8141 Candidate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull a simple or composite key that is unique and minimal

bull unique ndash no two rows in a table may have the same value at any time

bull minimal ndash every column must be necessary for uniqueness

bull For example for the entity

Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)

Possible candidate keys are EID SIN

FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment

EID SIN are also candidate keys

8142 Composite key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Composed of more than one attribute

bull For example as discussed earlier

First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department

bull A composite key can have two or more attributes but it must be minimal

29

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 14: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

310 Transaction ProcessingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The DBMS must include concurrency control subsystems to ensure that several userstrying to update the same data do so in a controlled manner The results of anyupdates to the database must maintain consistency and validity

311 Providing multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A view may be a subset of the database Various users may have different views of thedatabase itself Users may not need to be aware of how and where the data they referto is stored

312 Providing backup and recovery facilitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If the computer system fails in the middle of a complex update process the recoverysubsystem is responsible for making sure that the database is restored to the stage itwas in before the process started executing

313 Managing informationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Managing information means taking care of it so that it works for us and is useful forthe work we are doing The information we collect is no longer subject to ldquoaccidentaldisorganizationrdquo and becomes more easily accessible and integrated with the rest ofour work Managing information using a database allows us to become strategic usersof the data we have

We often need to access and re-sort data for various uses These may include

bull Creating mailing lists

bull Writing management reports

bull Generating lists of selected news stories

bull Identifying various client needs

The processing power of a database allows it to manipulate the data it houses so itcan

9

bull Sortbull Matchbull Linkbull Aggregatebull Skip fieldsbull Calculatebull Arrange

So a database tracks data finding the bits you need and processing them in the wayyou arrange for them to be processed

bull Because of the versatility of databases we find them powering all sorts ofprojects A database can be linked to

bull A web site that is capturing registered users

bull A client tracking application for social service organisations

bull A medical record system for a health care facility

bull Your personal address book in your e-mail client

bull A collection of word processed documents

bull A system that issues airline reservations

Source httpcnxorgcontentm28150latest (httpcnxorgcontentm28150latest)

Chapter 3 10

Chapter 4 Types of Database Models

41 High-level Conceptual Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Provide concepts that are close to the way people perceive data to present the data Atypical example of this type is the entity relationship model which uses main conceptslike entities attributes relationships An entity represents a real-world object such asan employee a project An entity has some attributes which represents properties ofentity such as employeersquos name address birthdate A relationship represents anassociation among entities for example an employee works on many projects Therelationship exists between employee and project

411 Record-based Logical Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Provide concepts that can be understood by the user but not too far from the waydata is stored in the computer Three well-known data models of this type arerelational data model network data model and hierarchical data model

bull The Relational model represents data as relations or tables

bull The Network model represents data as record types and also represents a limitedtype of one to many relationship called set type

Fig 41 httpcnxorgcontentm28150latest

The Hierarchical model represents data as a hierarchical tree structures Eachhierarchy represents a number of related records Here is the schema in hierarchicalmodel notation

11

Fig 42 httpcnxorgcontentm28150latest

Chapter 4 12

Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to

bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)

bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )

bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)

The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database

51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction

13

The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS

52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Is the userrsquos view of the database

bull contains multiple different external views

bull is closely related to the real world as perceived by each user

53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull These models provide a flexible data structuring capabilities

bull Community view the logical structure of the entire database

bull Contains lsquoWhatrsquo data is stored bull

bull Shows relationships among data

bull constraints

bull semantic information (business rules ndash discussed later)

bull security amp integrity information

bull A database is considered as a collection of entities (objects) of variouskinds

bull Basis for identification and high-level description of main data objects avoidsdetails

bull They are database independent It does not matter what database you will beusing

54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Database is considered as a collection of fixed ndash size record

bull This model is closer to the physical level or file structure

bull Is a representation of database as seen by the DBMS

Chapter 5 14

bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model

bull The entities in the conceptual model are mapped to the tables in the relationalmodel

bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model

55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Physical representation of the database

bull Lowest level of abstractions bull

bull It is how the data is stored dealing with

bull Run-time performance

bull Storage utilization amp compression

bull File organization amp access methods

bull Data encryption

bull Physical level ndash managed by OS

bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory

551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model

The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother

The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)

As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created

15

At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel

Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design

The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data

Fig 51

552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database

bull External Schemas

bull multiple multiple subschemas = multiple external views of the data bull

bull Conceptual Schema one only

bull Data items relationships amp constraints ndash ER diagram

bull Physical Schema one only

bull Record definitions representation methods

bull Access methods indexes hashing

Chapter 5 16

553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser

Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data

The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical

Source httpcnxorgcontentm28150latest

554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs

In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)

Source httpcnxorgcontentm28150latest

555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy

Source httpcnxorgcontentm28150latest

17

Chapter 6 Classification of DatabaseSystems

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The database management systems can be classified based on several criteria

61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine

62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently

63 Based on the ways database is distributed

631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

With centralized database systems the system is stored at a single site

Fig 61 Source httpcnxorgcontentm28150latest

Chapter 6 18

632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Actual database and DBMS software are distributed in various sites connected by acomputer network

Fig 62 Source httpcnxorgcontentm28150latest

633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily

634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Different sites might use different DBMS software There is additional software tosupport data exchange between sites

Source httpcnxorgcontentm28150latest

19

Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model

The relational model has provided the basis for

bull Research on theory of datarelationshipconstraint

bull Numerous database design methodologies

bull The standard database access language SQL

bull Almost all modern commercial database management systems

The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo

71 Fundamental concepts in Relational Data Model

711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute

Source httpcnxorgcontentm28250latest

712 TableAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables

Chapter 7 20

Fig 71

713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation

The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another

Look at the following example of an ID card to see the relationship between fields andtheir data

714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example

21

bull The domain of Marital status has a set of possibilities Married Single Divorced

bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip

bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000

bull The domain of First Name is the set of character strings that represents names ofpeople

In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter

715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records

A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases

Fig 72

The example above shows us how fields can hold a range of different sorts of data ndashthis one has

bull An ID field ordinal number integer

bull A Pubdate field daymonthyear date

bull An author field Initial Surname text

bull A title field text bull

bull A body text field text

By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates

Chapter 7 22

Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way

72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The degree is the number of attributes In our example above the degree is 4

73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Table has a name that is distinct from all other tables in the database

bull There are no duplicate rows each row is distinct

bull Entries in columns are atomic (no repeating groups or multivalued attributes)

bull Entries from columns are from the same domain based on their data type

number (numeric integer float smallinthellip)

character (string)

date

logical (true or false)

bull Operations combining different data types are disallowed

bull Each attribute has a distinct name

bull sequence of columns is insignificant

bull sequence of rows is insignificant

731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We have used many different terms Here is a summary Alternative 1 is the mostcommon

23

Chapter 7 24

Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The Entity Relationship Data Model (ER) has existed for over 35 years

The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations

ER modelling is based on two concepts

bull Entities that is things Eg Prof Ba Course Database System

bull Relationships that is associations or interactions between entities

Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams

The example database

For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model

This database contains the information about employees departments and projects

bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department

bull A department controls a number of projects each of which has unique name aunique number and the budget

bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects

bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee

81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An entity is an object in the real world with an independent existence and can bedifferentiated from other objects

An entity might be

bull An object with physical existence Eg a lecturer a student a car

25

bull An object with conceptual existence Eg a course a job a position

An Entity Type defines a collection of similar entities

An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box

Fig 81 Source httpcnxorgcontentm28250latest

811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics

bull they are the lsquobuilding blocksrsquo of a database

bull the primary key may be simple or composite

bull the primary key is not a foreign key

bull they do not depend on another entity for their existence

Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning

bull Dependent entities are used to connect two kernels together

bull They are said to be existent dependent on two or more tables

bull Many to many relationships become associative tables with at least two foreignkeys

bull They may contain other attributes

bull The foreign key identifies each associated table

bull There are three options for the primary key

bull use a composite of foreign keys of associated tables if unique

bull use a composite of foreign keys and qualifying column

bull create a new simple primary key

Characteristic entities provide more information about another table These entitieshave the following characteristic

Chapter 8 26

bull They represent multi-valued attributes

bull They describe other entities

bull They typically have a one to many relationship

bull The foreign key is used to further identify the characterized table

bull Options for primary key are as follows

bull foreign key plus a qualifying column

bull or create a new simple primary key

bull Employee(EID Name Address Age Salary)

bull EmployeePhone(EID Phone)

812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)

Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram

In the diagram each attribute is represented by an oval with a name inside

Fig 82 Source httpcnxorgcontentm28250latest

813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model

Simple attributes are attributes that are drawn from the atomic value domains

eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes

27

eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo

Fig 83 Source httpcnxorgcontentm28250latest

Multivalued attributes Attributes that have a set of values for each entity

eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo

Fig 84 Source httpcnxorgcontentm28250latest

Derived attributes Attributes contain values that are calculated from other attributes

eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute

Fig 85 Source httpcnxorgcontentm28250latest

Chapter 8 28

814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set

8141 Candidate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull a simple or composite key that is unique and minimal

bull unique ndash no two rows in a table may have the same value at any time

bull minimal ndash every column must be necessary for uniqueness

bull For example for the entity

Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)

Possible candidate keys are EID SIN

FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment

EID SIN are also candidate keys

8142 Composite key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Composed of more than one attribute

bull For example as discussed earlier

First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department

bull A composite key can have two or more attributes but it must be minimal

29

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 15: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

bull Sortbull Matchbull Linkbull Aggregatebull Skip fieldsbull Calculatebull Arrange

So a database tracks data finding the bits you need and processing them in the wayyou arrange for them to be processed

bull Because of the versatility of databases we find them powering all sorts ofprojects A database can be linked to

bull A web site that is capturing registered users

bull A client tracking application for social service organisations

bull A medical record system for a health care facility

bull Your personal address book in your e-mail client

bull A collection of word processed documents

bull A system that issues airline reservations

Source httpcnxorgcontentm28150latest (httpcnxorgcontentm28150latest)

Chapter 3 10

Chapter 4 Types of Database Models

41 High-level Conceptual Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Provide concepts that are close to the way people perceive data to present the data Atypical example of this type is the entity relationship model which uses main conceptslike entities attributes relationships An entity represents a real-world object such asan employee a project An entity has some attributes which represents properties ofentity such as employeersquos name address birthdate A relationship represents anassociation among entities for example an employee works on many projects Therelationship exists between employee and project

411 Record-based Logical Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Provide concepts that can be understood by the user but not too far from the waydata is stored in the computer Three well-known data models of this type arerelational data model network data model and hierarchical data model

bull The Relational model represents data as relations or tables

bull The Network model represents data as record types and also represents a limitedtype of one to many relationship called set type

Fig 41 httpcnxorgcontentm28150latest

The Hierarchical model represents data as a hierarchical tree structures Eachhierarchy represents a number of related records Here is the schema in hierarchicalmodel notation

11

Fig 42 httpcnxorgcontentm28150latest

Chapter 4 12

Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to

bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)

bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )

bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)

The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database

51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction

13

The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS

52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Is the userrsquos view of the database

bull contains multiple different external views

bull is closely related to the real world as perceived by each user

53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull These models provide a flexible data structuring capabilities

bull Community view the logical structure of the entire database

bull Contains lsquoWhatrsquo data is stored bull

bull Shows relationships among data

bull constraints

bull semantic information (business rules ndash discussed later)

bull security amp integrity information

bull A database is considered as a collection of entities (objects) of variouskinds

bull Basis for identification and high-level description of main data objects avoidsdetails

bull They are database independent It does not matter what database you will beusing

54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Database is considered as a collection of fixed ndash size record

bull This model is closer to the physical level or file structure

bull Is a representation of database as seen by the DBMS

Chapter 5 14

bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model

bull The entities in the conceptual model are mapped to the tables in the relationalmodel

bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model

55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Physical representation of the database

bull Lowest level of abstractions bull

bull It is how the data is stored dealing with

bull Run-time performance

bull Storage utilization amp compression

bull File organization amp access methods

bull Data encryption

bull Physical level ndash managed by OS

bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory

551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model

The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother

The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)

As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created

15

At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel

Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design

The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data

Fig 51

552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database

bull External Schemas

bull multiple multiple subschemas = multiple external views of the data bull

bull Conceptual Schema one only

bull Data items relationships amp constraints ndash ER diagram

bull Physical Schema one only

bull Record definitions representation methods

bull Access methods indexes hashing

Chapter 5 16

553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser

Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data

The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical

Source httpcnxorgcontentm28150latest

554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs

In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)

Source httpcnxorgcontentm28150latest

555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy

Source httpcnxorgcontentm28150latest

17

Chapter 6 Classification of DatabaseSystems

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The database management systems can be classified based on several criteria

61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine

62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently

63 Based on the ways database is distributed

631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

With centralized database systems the system is stored at a single site

Fig 61 Source httpcnxorgcontentm28150latest

Chapter 6 18

632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Actual database and DBMS software are distributed in various sites connected by acomputer network

Fig 62 Source httpcnxorgcontentm28150latest

633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily

634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Different sites might use different DBMS software There is additional software tosupport data exchange between sites

Source httpcnxorgcontentm28150latest

19

Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model

The relational model has provided the basis for

bull Research on theory of datarelationshipconstraint

bull Numerous database design methodologies

bull The standard database access language SQL

bull Almost all modern commercial database management systems

The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo

71 Fundamental concepts in Relational Data Model

711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute

Source httpcnxorgcontentm28250latest

712 TableAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables

Chapter 7 20

Fig 71

713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation

The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another

Look at the following example of an ID card to see the relationship between fields andtheir data

714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example

21

bull The domain of Marital status has a set of possibilities Married Single Divorced

bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip

bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000

bull The domain of First Name is the set of character strings that represents names ofpeople

In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter

715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records

A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases

Fig 72

The example above shows us how fields can hold a range of different sorts of data ndashthis one has

bull An ID field ordinal number integer

bull A Pubdate field daymonthyear date

bull An author field Initial Surname text

bull A title field text bull

bull A body text field text

By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates

Chapter 7 22

Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way

72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The degree is the number of attributes In our example above the degree is 4

73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Table has a name that is distinct from all other tables in the database

bull There are no duplicate rows each row is distinct

bull Entries in columns are atomic (no repeating groups or multivalued attributes)

bull Entries from columns are from the same domain based on their data type

number (numeric integer float smallinthellip)

character (string)

date

logical (true or false)

bull Operations combining different data types are disallowed

bull Each attribute has a distinct name

bull sequence of columns is insignificant

bull sequence of rows is insignificant

731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We have used many different terms Here is a summary Alternative 1 is the mostcommon

23

Chapter 7 24

Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The Entity Relationship Data Model (ER) has existed for over 35 years

The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations

ER modelling is based on two concepts

bull Entities that is things Eg Prof Ba Course Database System

bull Relationships that is associations or interactions between entities

Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams

The example database

For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model

This database contains the information about employees departments and projects

bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department

bull A department controls a number of projects each of which has unique name aunique number and the budget

bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects

bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee

81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An entity is an object in the real world with an independent existence and can bedifferentiated from other objects

An entity might be

bull An object with physical existence Eg a lecturer a student a car

25

bull An object with conceptual existence Eg a course a job a position

An Entity Type defines a collection of similar entities

An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box

Fig 81 Source httpcnxorgcontentm28250latest

811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics

bull they are the lsquobuilding blocksrsquo of a database

bull the primary key may be simple or composite

bull the primary key is not a foreign key

bull they do not depend on another entity for their existence

Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning

bull Dependent entities are used to connect two kernels together

bull They are said to be existent dependent on two or more tables

bull Many to many relationships become associative tables with at least two foreignkeys

bull They may contain other attributes

bull The foreign key identifies each associated table

bull There are three options for the primary key

bull use a composite of foreign keys of associated tables if unique

bull use a composite of foreign keys and qualifying column

bull create a new simple primary key

Characteristic entities provide more information about another table These entitieshave the following characteristic

Chapter 8 26

bull They represent multi-valued attributes

bull They describe other entities

bull They typically have a one to many relationship

bull The foreign key is used to further identify the characterized table

bull Options for primary key are as follows

bull foreign key plus a qualifying column

bull or create a new simple primary key

bull Employee(EID Name Address Age Salary)

bull EmployeePhone(EID Phone)

812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)

Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram

In the diagram each attribute is represented by an oval with a name inside

Fig 82 Source httpcnxorgcontentm28250latest

813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model

Simple attributes are attributes that are drawn from the atomic value domains

eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes

27

eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo

Fig 83 Source httpcnxorgcontentm28250latest

Multivalued attributes Attributes that have a set of values for each entity

eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo

Fig 84 Source httpcnxorgcontentm28250latest

Derived attributes Attributes contain values that are calculated from other attributes

eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute

Fig 85 Source httpcnxorgcontentm28250latest

Chapter 8 28

814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set

8141 Candidate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull a simple or composite key that is unique and minimal

bull unique ndash no two rows in a table may have the same value at any time

bull minimal ndash every column must be necessary for uniqueness

bull For example for the entity

Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)

Possible candidate keys are EID SIN

FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment

EID SIN are also candidate keys

8142 Composite key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Composed of more than one attribute

bull For example as discussed earlier

First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department

bull A composite key can have two or more attributes but it must be minimal

29

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 16: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

Chapter 4 Types of Database Models

41 High-level Conceptual Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Provide concepts that are close to the way people perceive data to present the data Atypical example of this type is the entity relationship model which uses main conceptslike entities attributes relationships An entity represents a real-world object such asan employee a project An entity has some attributes which represents properties ofentity such as employeersquos name address birthdate A relationship represents anassociation among entities for example an employee works on many projects Therelationship exists between employee and project

411 Record-based Logical Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Provide concepts that can be understood by the user but not too far from the waydata is stored in the computer Three well-known data models of this type arerelational data model network data model and hierarchical data model

bull The Relational model represents data as relations or tables

bull The Network model represents data as record types and also represents a limitedtype of one to many relationship called set type

Fig 41 httpcnxorgcontentm28150latest

The Hierarchical model represents data as a hierarchical tree structures Eachhierarchy represents a number of related records Here is the schema in hierarchicalmodel notation

11

Fig 42 httpcnxorgcontentm28150latest

Chapter 4 12

Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to

bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)

bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )

bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)

The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database

51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction

13

The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS

52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Is the userrsquos view of the database

bull contains multiple different external views

bull is closely related to the real world as perceived by each user

53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull These models provide a flexible data structuring capabilities

bull Community view the logical structure of the entire database

bull Contains lsquoWhatrsquo data is stored bull

bull Shows relationships among data

bull constraints

bull semantic information (business rules ndash discussed later)

bull security amp integrity information

bull A database is considered as a collection of entities (objects) of variouskinds

bull Basis for identification and high-level description of main data objects avoidsdetails

bull They are database independent It does not matter what database you will beusing

54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Database is considered as a collection of fixed ndash size record

bull This model is closer to the physical level or file structure

bull Is a representation of database as seen by the DBMS

Chapter 5 14

bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model

bull The entities in the conceptual model are mapped to the tables in the relationalmodel

bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model

55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Physical representation of the database

bull Lowest level of abstractions bull

bull It is how the data is stored dealing with

bull Run-time performance

bull Storage utilization amp compression

bull File organization amp access methods

bull Data encryption

bull Physical level ndash managed by OS

bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory

551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model

The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother

The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)

As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created

15

At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel

Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design

The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data

Fig 51

552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database

bull External Schemas

bull multiple multiple subschemas = multiple external views of the data bull

bull Conceptual Schema one only

bull Data items relationships amp constraints ndash ER diagram

bull Physical Schema one only

bull Record definitions representation methods

bull Access methods indexes hashing

Chapter 5 16

553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser

Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data

The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical

Source httpcnxorgcontentm28150latest

554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs

In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)

Source httpcnxorgcontentm28150latest

555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy

Source httpcnxorgcontentm28150latest

17

Chapter 6 Classification of DatabaseSystems

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The database management systems can be classified based on several criteria

61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine

62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently

63 Based on the ways database is distributed

631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

With centralized database systems the system is stored at a single site

Fig 61 Source httpcnxorgcontentm28150latest

Chapter 6 18

632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Actual database and DBMS software are distributed in various sites connected by acomputer network

Fig 62 Source httpcnxorgcontentm28150latest

633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily

634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Different sites might use different DBMS software There is additional software tosupport data exchange between sites

Source httpcnxorgcontentm28150latest

19

Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model

The relational model has provided the basis for

bull Research on theory of datarelationshipconstraint

bull Numerous database design methodologies

bull The standard database access language SQL

bull Almost all modern commercial database management systems

The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo

71 Fundamental concepts in Relational Data Model

711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute

Source httpcnxorgcontentm28250latest

712 TableAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables

Chapter 7 20

Fig 71

713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation

The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another

Look at the following example of an ID card to see the relationship between fields andtheir data

714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example

21

bull The domain of Marital status has a set of possibilities Married Single Divorced

bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip

bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000

bull The domain of First Name is the set of character strings that represents names ofpeople

In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter

715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records

A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases

Fig 72

The example above shows us how fields can hold a range of different sorts of data ndashthis one has

bull An ID field ordinal number integer

bull A Pubdate field daymonthyear date

bull An author field Initial Surname text

bull A title field text bull

bull A body text field text

By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates

Chapter 7 22

Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way

72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The degree is the number of attributes In our example above the degree is 4

73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Table has a name that is distinct from all other tables in the database

bull There are no duplicate rows each row is distinct

bull Entries in columns are atomic (no repeating groups or multivalued attributes)

bull Entries from columns are from the same domain based on their data type

number (numeric integer float smallinthellip)

character (string)

date

logical (true or false)

bull Operations combining different data types are disallowed

bull Each attribute has a distinct name

bull sequence of columns is insignificant

bull sequence of rows is insignificant

731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We have used many different terms Here is a summary Alternative 1 is the mostcommon

23

Chapter 7 24

Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The Entity Relationship Data Model (ER) has existed for over 35 years

The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations

ER modelling is based on two concepts

bull Entities that is things Eg Prof Ba Course Database System

bull Relationships that is associations or interactions between entities

Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams

The example database

For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model

This database contains the information about employees departments and projects

bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department

bull A department controls a number of projects each of which has unique name aunique number and the budget

bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects

bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee

81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An entity is an object in the real world with an independent existence and can bedifferentiated from other objects

An entity might be

bull An object with physical existence Eg a lecturer a student a car

25

bull An object with conceptual existence Eg a course a job a position

An Entity Type defines a collection of similar entities

An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box

Fig 81 Source httpcnxorgcontentm28250latest

811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics

bull they are the lsquobuilding blocksrsquo of a database

bull the primary key may be simple or composite

bull the primary key is not a foreign key

bull they do not depend on another entity for their existence

Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning

bull Dependent entities are used to connect two kernels together

bull They are said to be existent dependent on two or more tables

bull Many to many relationships become associative tables with at least two foreignkeys

bull They may contain other attributes

bull The foreign key identifies each associated table

bull There are three options for the primary key

bull use a composite of foreign keys of associated tables if unique

bull use a composite of foreign keys and qualifying column

bull create a new simple primary key

Characteristic entities provide more information about another table These entitieshave the following characteristic

Chapter 8 26

bull They represent multi-valued attributes

bull They describe other entities

bull They typically have a one to many relationship

bull The foreign key is used to further identify the characterized table

bull Options for primary key are as follows

bull foreign key plus a qualifying column

bull or create a new simple primary key

bull Employee(EID Name Address Age Salary)

bull EmployeePhone(EID Phone)

812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)

Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram

In the diagram each attribute is represented by an oval with a name inside

Fig 82 Source httpcnxorgcontentm28250latest

813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model

Simple attributes are attributes that are drawn from the atomic value domains

eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes

27

eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo

Fig 83 Source httpcnxorgcontentm28250latest

Multivalued attributes Attributes that have a set of values for each entity

eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo

Fig 84 Source httpcnxorgcontentm28250latest

Derived attributes Attributes contain values that are calculated from other attributes

eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute

Fig 85 Source httpcnxorgcontentm28250latest

Chapter 8 28

814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set

8141 Candidate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull a simple or composite key that is unique and minimal

bull unique ndash no two rows in a table may have the same value at any time

bull minimal ndash every column must be necessary for uniqueness

bull For example for the entity

Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)

Possible candidate keys are EID SIN

FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment

EID SIN are also candidate keys

8142 Composite key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Composed of more than one attribute

bull For example as discussed earlier

First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department

bull A composite key can have two or more attributes but it must be minimal

29

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 17: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

Fig 42 httpcnxorgcontentm28150latest

Chapter 4 12

Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to

bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)

bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )

bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)

The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database

51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction

13

The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS

52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Is the userrsquos view of the database

bull contains multiple different external views

bull is closely related to the real world as perceived by each user

53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull These models provide a flexible data structuring capabilities

bull Community view the logical structure of the entire database

bull Contains lsquoWhatrsquo data is stored bull

bull Shows relationships among data

bull constraints

bull semantic information (business rules ndash discussed later)

bull security amp integrity information

bull A database is considered as a collection of entities (objects) of variouskinds

bull Basis for identification and high-level description of main data objects avoidsdetails

bull They are database independent It does not matter what database you will beusing

54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Database is considered as a collection of fixed ndash size record

bull This model is closer to the physical level or file structure

bull Is a representation of database as seen by the DBMS

Chapter 5 14

bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model

bull The entities in the conceptual model are mapped to the tables in the relationalmodel

bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model

55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Physical representation of the database

bull Lowest level of abstractions bull

bull It is how the data is stored dealing with

bull Run-time performance

bull Storage utilization amp compression

bull File organization amp access methods

bull Data encryption

bull Physical level ndash managed by OS

bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory

551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model

The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother

The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)

As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created

15

At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel

Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design

The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data

Fig 51

552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database

bull External Schemas

bull multiple multiple subschemas = multiple external views of the data bull

bull Conceptual Schema one only

bull Data items relationships amp constraints ndash ER diagram

bull Physical Schema one only

bull Record definitions representation methods

bull Access methods indexes hashing

Chapter 5 16

553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser

Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data

The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical

Source httpcnxorgcontentm28150latest

554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs

In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)

Source httpcnxorgcontentm28150latest

555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy

Source httpcnxorgcontentm28150latest

17

Chapter 6 Classification of DatabaseSystems

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The database management systems can be classified based on several criteria

61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine

62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently

63 Based on the ways database is distributed

631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

With centralized database systems the system is stored at a single site

Fig 61 Source httpcnxorgcontentm28150latest

Chapter 6 18

632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Actual database and DBMS software are distributed in various sites connected by acomputer network

Fig 62 Source httpcnxorgcontentm28150latest

633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily

634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Different sites might use different DBMS software There is additional software tosupport data exchange between sites

Source httpcnxorgcontentm28150latest

19

Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model

The relational model has provided the basis for

bull Research on theory of datarelationshipconstraint

bull Numerous database design methodologies

bull The standard database access language SQL

bull Almost all modern commercial database management systems

The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo

71 Fundamental concepts in Relational Data Model

711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute

Source httpcnxorgcontentm28250latest

712 TableAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables

Chapter 7 20

Fig 71

713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation

The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another

Look at the following example of an ID card to see the relationship between fields andtheir data

714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example

21

bull The domain of Marital status has a set of possibilities Married Single Divorced

bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip

bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000

bull The domain of First Name is the set of character strings that represents names ofpeople

In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter

715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records

A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases

Fig 72

The example above shows us how fields can hold a range of different sorts of data ndashthis one has

bull An ID field ordinal number integer

bull A Pubdate field daymonthyear date

bull An author field Initial Surname text

bull A title field text bull

bull A body text field text

By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates

Chapter 7 22

Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way

72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The degree is the number of attributes In our example above the degree is 4

73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Table has a name that is distinct from all other tables in the database

bull There are no duplicate rows each row is distinct

bull Entries in columns are atomic (no repeating groups or multivalued attributes)

bull Entries from columns are from the same domain based on their data type

number (numeric integer float smallinthellip)

character (string)

date

logical (true or false)

bull Operations combining different data types are disallowed

bull Each attribute has a distinct name

bull sequence of columns is insignificant

bull sequence of rows is insignificant

731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We have used many different terms Here is a summary Alternative 1 is the mostcommon

23

Chapter 7 24

Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The Entity Relationship Data Model (ER) has existed for over 35 years

The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations

ER modelling is based on two concepts

bull Entities that is things Eg Prof Ba Course Database System

bull Relationships that is associations or interactions between entities

Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams

The example database

For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model

This database contains the information about employees departments and projects

bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department

bull A department controls a number of projects each of which has unique name aunique number and the budget

bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects

bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee

81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An entity is an object in the real world with an independent existence and can bedifferentiated from other objects

An entity might be

bull An object with physical existence Eg a lecturer a student a car

25

bull An object with conceptual existence Eg a course a job a position

An Entity Type defines a collection of similar entities

An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box

Fig 81 Source httpcnxorgcontentm28250latest

811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics

bull they are the lsquobuilding blocksrsquo of a database

bull the primary key may be simple or composite

bull the primary key is not a foreign key

bull they do not depend on another entity for their existence

Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning

bull Dependent entities are used to connect two kernels together

bull They are said to be existent dependent on two or more tables

bull Many to many relationships become associative tables with at least two foreignkeys

bull They may contain other attributes

bull The foreign key identifies each associated table

bull There are three options for the primary key

bull use a composite of foreign keys of associated tables if unique

bull use a composite of foreign keys and qualifying column

bull create a new simple primary key

Characteristic entities provide more information about another table These entitieshave the following characteristic

Chapter 8 26

bull They represent multi-valued attributes

bull They describe other entities

bull They typically have a one to many relationship

bull The foreign key is used to further identify the characterized table

bull Options for primary key are as follows

bull foreign key plus a qualifying column

bull or create a new simple primary key

bull Employee(EID Name Address Age Salary)

bull EmployeePhone(EID Phone)

812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)

Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram

In the diagram each attribute is represented by an oval with a name inside

Fig 82 Source httpcnxorgcontentm28250latest

813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model

Simple attributes are attributes that are drawn from the atomic value domains

eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes

27

eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo

Fig 83 Source httpcnxorgcontentm28250latest

Multivalued attributes Attributes that have a set of values for each entity

eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo

Fig 84 Source httpcnxorgcontentm28250latest

Derived attributes Attributes contain values that are calculated from other attributes

eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute

Fig 85 Source httpcnxorgcontentm28250latest

Chapter 8 28

814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set

8141 Candidate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull a simple or composite key that is unique and minimal

bull unique ndash no two rows in a table may have the same value at any time

bull minimal ndash every column must be necessary for uniqueness

bull For example for the entity

Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)

Possible candidate keys are EID SIN

FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment

EID SIN are also candidate keys

8142 Composite key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Composed of more than one attribute

bull For example as discussed earlier

First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department

bull A composite key can have two or more attributes but it must be minimal

29

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 18: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to

bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)

bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )

bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)

The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database

51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction

13

The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS

52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Is the userrsquos view of the database

bull contains multiple different external views

bull is closely related to the real world as perceived by each user

53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull These models provide a flexible data structuring capabilities

bull Community view the logical structure of the entire database

bull Contains lsquoWhatrsquo data is stored bull

bull Shows relationships among data

bull constraints

bull semantic information (business rules ndash discussed later)

bull security amp integrity information

bull A database is considered as a collection of entities (objects) of variouskinds

bull Basis for identification and high-level description of main data objects avoidsdetails

bull They are database independent It does not matter what database you will beusing

54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Database is considered as a collection of fixed ndash size record

bull This model is closer to the physical level or file structure

bull Is a representation of database as seen by the DBMS

Chapter 5 14

bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model

bull The entities in the conceptual model are mapped to the tables in the relationalmodel

bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model

55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Physical representation of the database

bull Lowest level of abstractions bull

bull It is how the data is stored dealing with

bull Run-time performance

bull Storage utilization amp compression

bull File organization amp access methods

bull Data encryption

bull Physical level ndash managed by OS

bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory

551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model

The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother

The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)

As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created

15

At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel

Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design

The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data

Fig 51

552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database

bull External Schemas

bull multiple multiple subschemas = multiple external views of the data bull

bull Conceptual Schema one only

bull Data items relationships amp constraints ndash ER diagram

bull Physical Schema one only

bull Record definitions representation methods

bull Access methods indexes hashing

Chapter 5 16

553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser

Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data

The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical

Source httpcnxorgcontentm28150latest

554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs

In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)

Source httpcnxorgcontentm28150latest

555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy

Source httpcnxorgcontentm28150latest

17

Chapter 6 Classification of DatabaseSystems

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The database management systems can be classified based on several criteria

61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine

62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently

63 Based on the ways database is distributed

631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

With centralized database systems the system is stored at a single site

Fig 61 Source httpcnxorgcontentm28150latest

Chapter 6 18

632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Actual database and DBMS software are distributed in various sites connected by acomputer network

Fig 62 Source httpcnxorgcontentm28150latest

633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily

634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Different sites might use different DBMS software There is additional software tosupport data exchange between sites

Source httpcnxorgcontentm28150latest

19

Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model

The relational model has provided the basis for

bull Research on theory of datarelationshipconstraint

bull Numerous database design methodologies

bull The standard database access language SQL

bull Almost all modern commercial database management systems

The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo

71 Fundamental concepts in Relational Data Model

711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute

Source httpcnxorgcontentm28250latest

712 TableAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables

Chapter 7 20

Fig 71

713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation

The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another

Look at the following example of an ID card to see the relationship between fields andtheir data

714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example

21

bull The domain of Marital status has a set of possibilities Married Single Divorced

bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip

bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000

bull The domain of First Name is the set of character strings that represents names ofpeople

In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter

715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records

A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases

Fig 72

The example above shows us how fields can hold a range of different sorts of data ndashthis one has

bull An ID field ordinal number integer

bull A Pubdate field daymonthyear date

bull An author field Initial Surname text

bull A title field text bull

bull A body text field text

By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates

Chapter 7 22

Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way

72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The degree is the number of attributes In our example above the degree is 4

73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Table has a name that is distinct from all other tables in the database

bull There are no duplicate rows each row is distinct

bull Entries in columns are atomic (no repeating groups or multivalued attributes)

bull Entries from columns are from the same domain based on their data type

number (numeric integer float smallinthellip)

character (string)

date

logical (true or false)

bull Operations combining different data types are disallowed

bull Each attribute has a distinct name

bull sequence of columns is insignificant

bull sequence of rows is insignificant

731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We have used many different terms Here is a summary Alternative 1 is the mostcommon

23

Chapter 7 24

Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The Entity Relationship Data Model (ER) has existed for over 35 years

The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations

ER modelling is based on two concepts

bull Entities that is things Eg Prof Ba Course Database System

bull Relationships that is associations or interactions between entities

Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams

The example database

For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model

This database contains the information about employees departments and projects

bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department

bull A department controls a number of projects each of which has unique name aunique number and the budget

bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects

bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee

81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An entity is an object in the real world with an independent existence and can bedifferentiated from other objects

An entity might be

bull An object with physical existence Eg a lecturer a student a car

25

bull An object with conceptual existence Eg a course a job a position

An Entity Type defines a collection of similar entities

An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box

Fig 81 Source httpcnxorgcontentm28250latest

811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics

bull they are the lsquobuilding blocksrsquo of a database

bull the primary key may be simple or composite

bull the primary key is not a foreign key

bull they do not depend on another entity for their existence

Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning

bull Dependent entities are used to connect two kernels together

bull They are said to be existent dependent on two or more tables

bull Many to many relationships become associative tables with at least two foreignkeys

bull They may contain other attributes

bull The foreign key identifies each associated table

bull There are three options for the primary key

bull use a composite of foreign keys of associated tables if unique

bull use a composite of foreign keys and qualifying column

bull create a new simple primary key

Characteristic entities provide more information about another table These entitieshave the following characteristic

Chapter 8 26

bull They represent multi-valued attributes

bull They describe other entities

bull They typically have a one to many relationship

bull The foreign key is used to further identify the characterized table

bull Options for primary key are as follows

bull foreign key plus a qualifying column

bull or create a new simple primary key

bull Employee(EID Name Address Age Salary)

bull EmployeePhone(EID Phone)

812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)

Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram

In the diagram each attribute is represented by an oval with a name inside

Fig 82 Source httpcnxorgcontentm28250latest

813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model

Simple attributes are attributes that are drawn from the atomic value domains

eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes

27

eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo

Fig 83 Source httpcnxorgcontentm28250latest

Multivalued attributes Attributes that have a set of values for each entity

eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo

Fig 84 Source httpcnxorgcontentm28250latest

Derived attributes Attributes contain values that are calculated from other attributes

eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute

Fig 85 Source httpcnxorgcontentm28250latest

Chapter 8 28

814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set

8141 Candidate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull a simple or composite key that is unique and minimal

bull unique ndash no two rows in a table may have the same value at any time

bull minimal ndash every column must be necessary for uniqueness

bull For example for the entity

Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)

Possible candidate keys are EID SIN

FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment

EID SIN are also candidate keys

8142 Composite key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Composed of more than one attribute

bull For example as discussed earlier

First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department

bull A composite key can have two or more attributes but it must be minimal

29

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 19: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS

52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Is the userrsquos view of the database

bull contains multiple different external views

bull is closely related to the real world as perceived by each user

53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull These models provide a flexible data structuring capabilities

bull Community view the logical structure of the entire database

bull Contains lsquoWhatrsquo data is stored bull

bull Shows relationships among data

bull constraints

bull semantic information (business rules ndash discussed later)

bull security amp integrity information

bull A database is considered as a collection of entities (objects) of variouskinds

bull Basis for identification and high-level description of main data objects avoidsdetails

bull They are database independent It does not matter what database you will beusing

54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Database is considered as a collection of fixed ndash size record

bull This model is closer to the physical level or file structure

bull Is a representation of database as seen by the DBMS

Chapter 5 14

bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model

bull The entities in the conceptual model are mapped to the tables in the relationalmodel

bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model

55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Physical representation of the database

bull Lowest level of abstractions bull

bull It is how the data is stored dealing with

bull Run-time performance

bull Storage utilization amp compression

bull File organization amp access methods

bull Data encryption

bull Physical level ndash managed by OS

bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory

551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model

The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother

The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)

As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created

15

At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel

Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design

The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data

Fig 51

552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database

bull External Schemas

bull multiple multiple subschemas = multiple external views of the data bull

bull Conceptual Schema one only

bull Data items relationships amp constraints ndash ER diagram

bull Physical Schema one only

bull Record definitions representation methods

bull Access methods indexes hashing

Chapter 5 16

553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser

Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data

The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical

Source httpcnxorgcontentm28150latest

554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs

In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)

Source httpcnxorgcontentm28150latest

555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy

Source httpcnxorgcontentm28150latest

17

Chapter 6 Classification of DatabaseSystems

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The database management systems can be classified based on several criteria

61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine

62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently

63 Based on the ways database is distributed

631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

With centralized database systems the system is stored at a single site

Fig 61 Source httpcnxorgcontentm28150latest

Chapter 6 18

632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Actual database and DBMS software are distributed in various sites connected by acomputer network

Fig 62 Source httpcnxorgcontentm28150latest

633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily

634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Different sites might use different DBMS software There is additional software tosupport data exchange between sites

Source httpcnxorgcontentm28150latest

19

Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model

The relational model has provided the basis for

bull Research on theory of datarelationshipconstraint

bull Numerous database design methodologies

bull The standard database access language SQL

bull Almost all modern commercial database management systems

The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo

71 Fundamental concepts in Relational Data Model

711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute

Source httpcnxorgcontentm28250latest

712 TableAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables

Chapter 7 20

Fig 71

713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation

The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another

Look at the following example of an ID card to see the relationship between fields andtheir data

714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example

21

bull The domain of Marital status has a set of possibilities Married Single Divorced

bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip

bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000

bull The domain of First Name is the set of character strings that represents names ofpeople

In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter

715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records

A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases

Fig 72

The example above shows us how fields can hold a range of different sorts of data ndashthis one has

bull An ID field ordinal number integer

bull A Pubdate field daymonthyear date

bull An author field Initial Surname text

bull A title field text bull

bull A body text field text

By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates

Chapter 7 22

Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way

72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The degree is the number of attributes In our example above the degree is 4

73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Table has a name that is distinct from all other tables in the database

bull There are no duplicate rows each row is distinct

bull Entries in columns are atomic (no repeating groups or multivalued attributes)

bull Entries from columns are from the same domain based on their data type

number (numeric integer float smallinthellip)

character (string)

date

logical (true or false)

bull Operations combining different data types are disallowed

bull Each attribute has a distinct name

bull sequence of columns is insignificant

bull sequence of rows is insignificant

731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We have used many different terms Here is a summary Alternative 1 is the mostcommon

23

Chapter 7 24

Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The Entity Relationship Data Model (ER) has existed for over 35 years

The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations

ER modelling is based on two concepts

bull Entities that is things Eg Prof Ba Course Database System

bull Relationships that is associations or interactions between entities

Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams

The example database

For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model

This database contains the information about employees departments and projects

bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department

bull A department controls a number of projects each of which has unique name aunique number and the budget

bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects

bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee

81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An entity is an object in the real world with an independent existence and can bedifferentiated from other objects

An entity might be

bull An object with physical existence Eg a lecturer a student a car

25

bull An object with conceptual existence Eg a course a job a position

An Entity Type defines a collection of similar entities

An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box

Fig 81 Source httpcnxorgcontentm28250latest

811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics

bull they are the lsquobuilding blocksrsquo of a database

bull the primary key may be simple or composite

bull the primary key is not a foreign key

bull they do not depend on another entity for their existence

Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning

bull Dependent entities are used to connect two kernels together

bull They are said to be existent dependent on two or more tables

bull Many to many relationships become associative tables with at least two foreignkeys

bull They may contain other attributes

bull The foreign key identifies each associated table

bull There are three options for the primary key

bull use a composite of foreign keys of associated tables if unique

bull use a composite of foreign keys and qualifying column

bull create a new simple primary key

Characteristic entities provide more information about another table These entitieshave the following characteristic

Chapter 8 26

bull They represent multi-valued attributes

bull They describe other entities

bull They typically have a one to many relationship

bull The foreign key is used to further identify the characterized table

bull Options for primary key are as follows

bull foreign key plus a qualifying column

bull or create a new simple primary key

bull Employee(EID Name Address Age Salary)

bull EmployeePhone(EID Phone)

812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)

Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram

In the diagram each attribute is represented by an oval with a name inside

Fig 82 Source httpcnxorgcontentm28250latest

813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model

Simple attributes are attributes that are drawn from the atomic value domains

eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes

27

eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo

Fig 83 Source httpcnxorgcontentm28250latest

Multivalued attributes Attributes that have a set of values for each entity

eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo

Fig 84 Source httpcnxorgcontentm28250latest

Derived attributes Attributes contain values that are calculated from other attributes

eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute

Fig 85 Source httpcnxorgcontentm28250latest

Chapter 8 28

814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set

8141 Candidate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull a simple or composite key that is unique and minimal

bull unique ndash no two rows in a table may have the same value at any time

bull minimal ndash every column must be necessary for uniqueness

bull For example for the entity

Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)

Possible candidate keys are EID SIN

FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment

EID SIN are also candidate keys

8142 Composite key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Composed of more than one attribute

bull For example as discussed earlier

First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department

bull A composite key can have two or more attributes but it must be minimal

29

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 20: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model

bull The entities in the conceptual model are mapped to the tables in the relationalmodel

bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model

55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Physical representation of the database

bull Lowest level of abstractions bull

bull It is how the data is stored dealing with

bull Run-time performance

bull Storage utilization amp compression

bull File organization amp access methods

bull Data encryption

bull Physical level ndash managed by OS

bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory

551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model

The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother

The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)

As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created

15

At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel

Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design

The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data

Fig 51

552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database

bull External Schemas

bull multiple multiple subschemas = multiple external views of the data bull

bull Conceptual Schema one only

bull Data items relationships amp constraints ndash ER diagram

bull Physical Schema one only

bull Record definitions representation methods

bull Access methods indexes hashing

Chapter 5 16

553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser

Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data

The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical

Source httpcnxorgcontentm28150latest

554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs

In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)

Source httpcnxorgcontentm28150latest

555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy

Source httpcnxorgcontentm28150latest

17

Chapter 6 Classification of DatabaseSystems

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The database management systems can be classified based on several criteria

61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine

62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently

63 Based on the ways database is distributed

631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

With centralized database systems the system is stored at a single site

Fig 61 Source httpcnxorgcontentm28150latest

Chapter 6 18

632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Actual database and DBMS software are distributed in various sites connected by acomputer network

Fig 62 Source httpcnxorgcontentm28150latest

633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily

634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Different sites might use different DBMS software There is additional software tosupport data exchange between sites

Source httpcnxorgcontentm28150latest

19

Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model

The relational model has provided the basis for

bull Research on theory of datarelationshipconstraint

bull Numerous database design methodologies

bull The standard database access language SQL

bull Almost all modern commercial database management systems

The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo

71 Fundamental concepts in Relational Data Model

711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute

Source httpcnxorgcontentm28250latest

712 TableAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables

Chapter 7 20

Fig 71

713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation

The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another

Look at the following example of an ID card to see the relationship between fields andtheir data

714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example

21

bull The domain of Marital status has a set of possibilities Married Single Divorced

bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip

bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000

bull The domain of First Name is the set of character strings that represents names ofpeople

In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter

715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records

A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases

Fig 72

The example above shows us how fields can hold a range of different sorts of data ndashthis one has

bull An ID field ordinal number integer

bull A Pubdate field daymonthyear date

bull An author field Initial Surname text

bull A title field text bull

bull A body text field text

By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates

Chapter 7 22

Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way

72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The degree is the number of attributes In our example above the degree is 4

73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Table has a name that is distinct from all other tables in the database

bull There are no duplicate rows each row is distinct

bull Entries in columns are atomic (no repeating groups or multivalued attributes)

bull Entries from columns are from the same domain based on their data type

number (numeric integer float smallinthellip)

character (string)

date

logical (true or false)

bull Operations combining different data types are disallowed

bull Each attribute has a distinct name

bull sequence of columns is insignificant

bull sequence of rows is insignificant

731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We have used many different terms Here is a summary Alternative 1 is the mostcommon

23

Chapter 7 24

Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The Entity Relationship Data Model (ER) has existed for over 35 years

The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations

ER modelling is based on two concepts

bull Entities that is things Eg Prof Ba Course Database System

bull Relationships that is associations or interactions between entities

Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams

The example database

For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model

This database contains the information about employees departments and projects

bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department

bull A department controls a number of projects each of which has unique name aunique number and the budget

bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects

bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee

81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An entity is an object in the real world with an independent existence and can bedifferentiated from other objects

An entity might be

bull An object with physical existence Eg a lecturer a student a car

25

bull An object with conceptual existence Eg a course a job a position

An Entity Type defines a collection of similar entities

An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box

Fig 81 Source httpcnxorgcontentm28250latest

811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics

bull they are the lsquobuilding blocksrsquo of a database

bull the primary key may be simple or composite

bull the primary key is not a foreign key

bull they do not depend on another entity for their existence

Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning

bull Dependent entities are used to connect two kernels together

bull They are said to be existent dependent on two or more tables

bull Many to many relationships become associative tables with at least two foreignkeys

bull They may contain other attributes

bull The foreign key identifies each associated table

bull There are three options for the primary key

bull use a composite of foreign keys of associated tables if unique

bull use a composite of foreign keys and qualifying column

bull create a new simple primary key

Characteristic entities provide more information about another table These entitieshave the following characteristic

Chapter 8 26

bull They represent multi-valued attributes

bull They describe other entities

bull They typically have a one to many relationship

bull The foreign key is used to further identify the characterized table

bull Options for primary key are as follows

bull foreign key plus a qualifying column

bull or create a new simple primary key

bull Employee(EID Name Address Age Salary)

bull EmployeePhone(EID Phone)

812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)

Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram

In the diagram each attribute is represented by an oval with a name inside

Fig 82 Source httpcnxorgcontentm28250latest

813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model

Simple attributes are attributes that are drawn from the atomic value domains

eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes

27

eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo

Fig 83 Source httpcnxorgcontentm28250latest

Multivalued attributes Attributes that have a set of values for each entity

eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo

Fig 84 Source httpcnxorgcontentm28250latest

Derived attributes Attributes contain values that are calculated from other attributes

eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute

Fig 85 Source httpcnxorgcontentm28250latest

Chapter 8 28

814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set

8141 Candidate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull a simple or composite key that is unique and minimal

bull unique ndash no two rows in a table may have the same value at any time

bull minimal ndash every column must be necessary for uniqueness

bull For example for the entity

Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)

Possible candidate keys are EID SIN

FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment

EID SIN are also candidate keys

8142 Composite key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Composed of more than one attribute

bull For example as discussed earlier

First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department

bull A composite key can have two or more attributes but it must be minimal

29

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 21: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel

Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design

The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data

Fig 51

552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database

bull External Schemas

bull multiple multiple subschemas = multiple external views of the data bull

bull Conceptual Schema one only

bull Data items relationships amp constraints ndash ER diagram

bull Physical Schema one only

bull Record definitions representation methods

bull Access methods indexes hashing

Chapter 5 16

553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser

Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data

The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical

Source httpcnxorgcontentm28150latest

554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs

In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)

Source httpcnxorgcontentm28150latest

555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy

Source httpcnxorgcontentm28150latest

17

Chapter 6 Classification of DatabaseSystems

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The database management systems can be classified based on several criteria

61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine

62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently

63 Based on the ways database is distributed

631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

With centralized database systems the system is stored at a single site

Fig 61 Source httpcnxorgcontentm28150latest

Chapter 6 18

632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Actual database and DBMS software are distributed in various sites connected by acomputer network

Fig 62 Source httpcnxorgcontentm28150latest

633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily

634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Different sites might use different DBMS software There is additional software tosupport data exchange between sites

Source httpcnxorgcontentm28150latest

19

Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model

The relational model has provided the basis for

bull Research on theory of datarelationshipconstraint

bull Numerous database design methodologies

bull The standard database access language SQL

bull Almost all modern commercial database management systems

The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo

71 Fundamental concepts in Relational Data Model

711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute

Source httpcnxorgcontentm28250latest

712 TableAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables

Chapter 7 20

Fig 71

713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation

The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another

Look at the following example of an ID card to see the relationship between fields andtheir data

714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example

21

bull The domain of Marital status has a set of possibilities Married Single Divorced

bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip

bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000

bull The domain of First Name is the set of character strings that represents names ofpeople

In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter

715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records

A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases

Fig 72

The example above shows us how fields can hold a range of different sorts of data ndashthis one has

bull An ID field ordinal number integer

bull A Pubdate field daymonthyear date

bull An author field Initial Surname text

bull A title field text bull

bull A body text field text

By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates

Chapter 7 22

Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way

72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The degree is the number of attributes In our example above the degree is 4

73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Table has a name that is distinct from all other tables in the database

bull There are no duplicate rows each row is distinct

bull Entries in columns are atomic (no repeating groups or multivalued attributes)

bull Entries from columns are from the same domain based on their data type

number (numeric integer float smallinthellip)

character (string)

date

logical (true or false)

bull Operations combining different data types are disallowed

bull Each attribute has a distinct name

bull sequence of columns is insignificant

bull sequence of rows is insignificant

731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We have used many different terms Here is a summary Alternative 1 is the mostcommon

23

Chapter 7 24

Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The Entity Relationship Data Model (ER) has existed for over 35 years

The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations

ER modelling is based on two concepts

bull Entities that is things Eg Prof Ba Course Database System

bull Relationships that is associations or interactions between entities

Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams

The example database

For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model

This database contains the information about employees departments and projects

bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department

bull A department controls a number of projects each of which has unique name aunique number and the budget

bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects

bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee

81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An entity is an object in the real world with an independent existence and can bedifferentiated from other objects

An entity might be

bull An object with physical existence Eg a lecturer a student a car

25

bull An object with conceptual existence Eg a course a job a position

An Entity Type defines a collection of similar entities

An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box

Fig 81 Source httpcnxorgcontentm28250latest

811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics

bull they are the lsquobuilding blocksrsquo of a database

bull the primary key may be simple or composite

bull the primary key is not a foreign key

bull they do not depend on another entity for their existence

Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning

bull Dependent entities are used to connect two kernels together

bull They are said to be existent dependent on two or more tables

bull Many to many relationships become associative tables with at least two foreignkeys

bull They may contain other attributes

bull The foreign key identifies each associated table

bull There are three options for the primary key

bull use a composite of foreign keys of associated tables if unique

bull use a composite of foreign keys and qualifying column

bull create a new simple primary key

Characteristic entities provide more information about another table These entitieshave the following characteristic

Chapter 8 26

bull They represent multi-valued attributes

bull They describe other entities

bull They typically have a one to many relationship

bull The foreign key is used to further identify the characterized table

bull Options for primary key are as follows

bull foreign key plus a qualifying column

bull or create a new simple primary key

bull Employee(EID Name Address Age Salary)

bull EmployeePhone(EID Phone)

812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)

Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram

In the diagram each attribute is represented by an oval with a name inside

Fig 82 Source httpcnxorgcontentm28250latest

813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model

Simple attributes are attributes that are drawn from the atomic value domains

eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes

27

eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo

Fig 83 Source httpcnxorgcontentm28250latest

Multivalued attributes Attributes that have a set of values for each entity

eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo

Fig 84 Source httpcnxorgcontentm28250latest

Derived attributes Attributes contain values that are calculated from other attributes

eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute

Fig 85 Source httpcnxorgcontentm28250latest

Chapter 8 28

814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set

8141 Candidate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull a simple or composite key that is unique and minimal

bull unique ndash no two rows in a table may have the same value at any time

bull minimal ndash every column must be necessary for uniqueness

bull For example for the entity

Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)

Possible candidate keys are EID SIN

FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment

EID SIN are also candidate keys

8142 Composite key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Composed of more than one attribute

bull For example as discussed earlier

First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department

bull A composite key can have two or more attributes but it must be minimal

29

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 22: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser

Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data

The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical

Source httpcnxorgcontentm28150latest

554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs

In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)

Source httpcnxorgcontentm28150latest

555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy

Source httpcnxorgcontentm28150latest

17

Chapter 6 Classification of DatabaseSystems

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The database management systems can be classified based on several criteria

61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine

62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently

63 Based on the ways database is distributed

631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

With centralized database systems the system is stored at a single site

Fig 61 Source httpcnxorgcontentm28150latest

Chapter 6 18

632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Actual database and DBMS software are distributed in various sites connected by acomputer network

Fig 62 Source httpcnxorgcontentm28150latest

633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily

634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Different sites might use different DBMS software There is additional software tosupport data exchange between sites

Source httpcnxorgcontentm28150latest

19

Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model

The relational model has provided the basis for

bull Research on theory of datarelationshipconstraint

bull Numerous database design methodologies

bull The standard database access language SQL

bull Almost all modern commercial database management systems

The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo

71 Fundamental concepts in Relational Data Model

711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute

Source httpcnxorgcontentm28250latest

712 TableAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables

Chapter 7 20

Fig 71

713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation

The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another

Look at the following example of an ID card to see the relationship between fields andtheir data

714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example

21

bull The domain of Marital status has a set of possibilities Married Single Divorced

bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip

bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000

bull The domain of First Name is the set of character strings that represents names ofpeople

In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter

715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records

A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases

Fig 72

The example above shows us how fields can hold a range of different sorts of data ndashthis one has

bull An ID field ordinal number integer

bull A Pubdate field daymonthyear date

bull An author field Initial Surname text

bull A title field text bull

bull A body text field text

By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates

Chapter 7 22

Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way

72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The degree is the number of attributes In our example above the degree is 4

73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Table has a name that is distinct from all other tables in the database

bull There are no duplicate rows each row is distinct

bull Entries in columns are atomic (no repeating groups or multivalued attributes)

bull Entries from columns are from the same domain based on their data type

number (numeric integer float smallinthellip)

character (string)

date

logical (true or false)

bull Operations combining different data types are disallowed

bull Each attribute has a distinct name

bull sequence of columns is insignificant

bull sequence of rows is insignificant

731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We have used many different terms Here is a summary Alternative 1 is the mostcommon

23

Chapter 7 24

Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The Entity Relationship Data Model (ER) has existed for over 35 years

The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations

ER modelling is based on two concepts

bull Entities that is things Eg Prof Ba Course Database System

bull Relationships that is associations or interactions between entities

Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams

The example database

For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model

This database contains the information about employees departments and projects

bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department

bull A department controls a number of projects each of which has unique name aunique number and the budget

bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects

bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee

81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An entity is an object in the real world with an independent existence and can bedifferentiated from other objects

An entity might be

bull An object with physical existence Eg a lecturer a student a car

25

bull An object with conceptual existence Eg a course a job a position

An Entity Type defines a collection of similar entities

An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box

Fig 81 Source httpcnxorgcontentm28250latest

811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics

bull they are the lsquobuilding blocksrsquo of a database

bull the primary key may be simple or composite

bull the primary key is not a foreign key

bull they do not depend on another entity for their existence

Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning

bull Dependent entities are used to connect two kernels together

bull They are said to be existent dependent on two or more tables

bull Many to many relationships become associative tables with at least two foreignkeys

bull They may contain other attributes

bull The foreign key identifies each associated table

bull There are three options for the primary key

bull use a composite of foreign keys of associated tables if unique

bull use a composite of foreign keys and qualifying column

bull create a new simple primary key

Characteristic entities provide more information about another table These entitieshave the following characteristic

Chapter 8 26

bull They represent multi-valued attributes

bull They describe other entities

bull They typically have a one to many relationship

bull The foreign key is used to further identify the characterized table

bull Options for primary key are as follows

bull foreign key plus a qualifying column

bull or create a new simple primary key

bull Employee(EID Name Address Age Salary)

bull EmployeePhone(EID Phone)

812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)

Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram

In the diagram each attribute is represented by an oval with a name inside

Fig 82 Source httpcnxorgcontentm28250latest

813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model

Simple attributes are attributes that are drawn from the atomic value domains

eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes

27

eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo

Fig 83 Source httpcnxorgcontentm28250latest

Multivalued attributes Attributes that have a set of values for each entity

eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo

Fig 84 Source httpcnxorgcontentm28250latest

Derived attributes Attributes contain values that are calculated from other attributes

eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute

Fig 85 Source httpcnxorgcontentm28250latest

Chapter 8 28

814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set

8141 Candidate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull a simple or composite key that is unique and minimal

bull unique ndash no two rows in a table may have the same value at any time

bull minimal ndash every column must be necessary for uniqueness

bull For example for the entity

Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)

Possible candidate keys are EID SIN

FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment

EID SIN are also candidate keys

8142 Composite key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Composed of more than one attribute

bull For example as discussed earlier

First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department

bull A composite key can have two or more attributes but it must be minimal

29

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 23: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

Chapter 6 Classification of DatabaseSystems

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The database management systems can be classified based on several criteria

61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine

62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently

63 Based on the ways database is distributed

631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

With centralized database systems the system is stored at a single site

Fig 61 Source httpcnxorgcontentm28150latest

Chapter 6 18

632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Actual database and DBMS software are distributed in various sites connected by acomputer network

Fig 62 Source httpcnxorgcontentm28150latest

633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily

634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Different sites might use different DBMS software There is additional software tosupport data exchange between sites

Source httpcnxorgcontentm28150latest

19

Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model

The relational model has provided the basis for

bull Research on theory of datarelationshipconstraint

bull Numerous database design methodologies

bull The standard database access language SQL

bull Almost all modern commercial database management systems

The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo

71 Fundamental concepts in Relational Data Model

711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute

Source httpcnxorgcontentm28250latest

712 TableAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables

Chapter 7 20

Fig 71

713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation

The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another

Look at the following example of an ID card to see the relationship between fields andtheir data

714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example

21

bull The domain of Marital status has a set of possibilities Married Single Divorced

bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip

bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000

bull The domain of First Name is the set of character strings that represents names ofpeople

In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter

715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records

A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases

Fig 72

The example above shows us how fields can hold a range of different sorts of data ndashthis one has

bull An ID field ordinal number integer

bull A Pubdate field daymonthyear date

bull An author field Initial Surname text

bull A title field text bull

bull A body text field text

By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates

Chapter 7 22

Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way

72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The degree is the number of attributes In our example above the degree is 4

73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Table has a name that is distinct from all other tables in the database

bull There are no duplicate rows each row is distinct

bull Entries in columns are atomic (no repeating groups or multivalued attributes)

bull Entries from columns are from the same domain based on their data type

number (numeric integer float smallinthellip)

character (string)

date

logical (true or false)

bull Operations combining different data types are disallowed

bull Each attribute has a distinct name

bull sequence of columns is insignificant

bull sequence of rows is insignificant

731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We have used many different terms Here is a summary Alternative 1 is the mostcommon

23

Chapter 7 24

Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The Entity Relationship Data Model (ER) has existed for over 35 years

The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations

ER modelling is based on two concepts

bull Entities that is things Eg Prof Ba Course Database System

bull Relationships that is associations or interactions between entities

Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams

The example database

For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model

This database contains the information about employees departments and projects

bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department

bull A department controls a number of projects each of which has unique name aunique number and the budget

bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects

bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee

81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An entity is an object in the real world with an independent existence and can bedifferentiated from other objects

An entity might be

bull An object with physical existence Eg a lecturer a student a car

25

bull An object with conceptual existence Eg a course a job a position

An Entity Type defines a collection of similar entities

An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box

Fig 81 Source httpcnxorgcontentm28250latest

811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics

bull they are the lsquobuilding blocksrsquo of a database

bull the primary key may be simple or composite

bull the primary key is not a foreign key

bull they do not depend on another entity for their existence

Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning

bull Dependent entities are used to connect two kernels together

bull They are said to be existent dependent on two or more tables

bull Many to many relationships become associative tables with at least two foreignkeys

bull They may contain other attributes

bull The foreign key identifies each associated table

bull There are three options for the primary key

bull use a composite of foreign keys of associated tables if unique

bull use a composite of foreign keys and qualifying column

bull create a new simple primary key

Characteristic entities provide more information about another table These entitieshave the following characteristic

Chapter 8 26

bull They represent multi-valued attributes

bull They describe other entities

bull They typically have a one to many relationship

bull The foreign key is used to further identify the characterized table

bull Options for primary key are as follows

bull foreign key plus a qualifying column

bull or create a new simple primary key

bull Employee(EID Name Address Age Salary)

bull EmployeePhone(EID Phone)

812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)

Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram

In the diagram each attribute is represented by an oval with a name inside

Fig 82 Source httpcnxorgcontentm28250latest

813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model

Simple attributes are attributes that are drawn from the atomic value domains

eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes

27

eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo

Fig 83 Source httpcnxorgcontentm28250latest

Multivalued attributes Attributes that have a set of values for each entity

eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo

Fig 84 Source httpcnxorgcontentm28250latest

Derived attributes Attributes contain values that are calculated from other attributes

eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute

Fig 85 Source httpcnxorgcontentm28250latest

Chapter 8 28

814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set

8141 Candidate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull a simple or composite key that is unique and minimal

bull unique ndash no two rows in a table may have the same value at any time

bull minimal ndash every column must be necessary for uniqueness

bull For example for the entity

Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)

Possible candidate keys are EID SIN

FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment

EID SIN are also candidate keys

8142 Composite key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Composed of more than one attribute

bull For example as discussed earlier

First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department

bull A composite key can have two or more attributes but it must be minimal

29

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 24: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Actual database and DBMS software are distributed in various sites connected by acomputer network

Fig 62 Source httpcnxorgcontentm28150latest

633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily

634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Different sites might use different DBMS software There is additional software tosupport data exchange between sites

Source httpcnxorgcontentm28150latest

19

Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model

The relational model has provided the basis for

bull Research on theory of datarelationshipconstraint

bull Numerous database design methodologies

bull The standard database access language SQL

bull Almost all modern commercial database management systems

The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo

71 Fundamental concepts in Relational Data Model

711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute

Source httpcnxorgcontentm28250latest

712 TableAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables

Chapter 7 20

Fig 71

713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation

The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another

Look at the following example of an ID card to see the relationship between fields andtheir data

714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example

21

bull The domain of Marital status has a set of possibilities Married Single Divorced

bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip

bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000

bull The domain of First Name is the set of character strings that represents names ofpeople

In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter

715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records

A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases

Fig 72

The example above shows us how fields can hold a range of different sorts of data ndashthis one has

bull An ID field ordinal number integer

bull A Pubdate field daymonthyear date

bull An author field Initial Surname text

bull A title field text bull

bull A body text field text

By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates

Chapter 7 22

Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way

72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The degree is the number of attributes In our example above the degree is 4

73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Table has a name that is distinct from all other tables in the database

bull There are no duplicate rows each row is distinct

bull Entries in columns are atomic (no repeating groups or multivalued attributes)

bull Entries from columns are from the same domain based on their data type

number (numeric integer float smallinthellip)

character (string)

date

logical (true or false)

bull Operations combining different data types are disallowed

bull Each attribute has a distinct name

bull sequence of columns is insignificant

bull sequence of rows is insignificant

731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We have used many different terms Here is a summary Alternative 1 is the mostcommon

23

Chapter 7 24

Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The Entity Relationship Data Model (ER) has existed for over 35 years

The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations

ER modelling is based on two concepts

bull Entities that is things Eg Prof Ba Course Database System

bull Relationships that is associations or interactions between entities

Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams

The example database

For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model

This database contains the information about employees departments and projects

bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department

bull A department controls a number of projects each of which has unique name aunique number and the budget

bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects

bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee

81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An entity is an object in the real world with an independent existence and can bedifferentiated from other objects

An entity might be

bull An object with physical existence Eg a lecturer a student a car

25

bull An object with conceptual existence Eg a course a job a position

An Entity Type defines a collection of similar entities

An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box

Fig 81 Source httpcnxorgcontentm28250latest

811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics

bull they are the lsquobuilding blocksrsquo of a database

bull the primary key may be simple or composite

bull the primary key is not a foreign key

bull they do not depend on another entity for their existence

Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning

bull Dependent entities are used to connect two kernels together

bull They are said to be existent dependent on two or more tables

bull Many to many relationships become associative tables with at least two foreignkeys

bull They may contain other attributes

bull The foreign key identifies each associated table

bull There are three options for the primary key

bull use a composite of foreign keys of associated tables if unique

bull use a composite of foreign keys and qualifying column

bull create a new simple primary key

Characteristic entities provide more information about another table These entitieshave the following characteristic

Chapter 8 26

bull They represent multi-valued attributes

bull They describe other entities

bull They typically have a one to many relationship

bull The foreign key is used to further identify the characterized table

bull Options for primary key are as follows

bull foreign key plus a qualifying column

bull or create a new simple primary key

bull Employee(EID Name Address Age Salary)

bull EmployeePhone(EID Phone)

812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)

Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram

In the diagram each attribute is represented by an oval with a name inside

Fig 82 Source httpcnxorgcontentm28250latest

813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model

Simple attributes are attributes that are drawn from the atomic value domains

eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes

27

eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo

Fig 83 Source httpcnxorgcontentm28250latest

Multivalued attributes Attributes that have a set of values for each entity

eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo

Fig 84 Source httpcnxorgcontentm28250latest

Derived attributes Attributes contain values that are calculated from other attributes

eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute

Fig 85 Source httpcnxorgcontentm28250latest

Chapter 8 28

814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set

8141 Candidate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull a simple or composite key that is unique and minimal

bull unique ndash no two rows in a table may have the same value at any time

bull minimal ndash every column must be necessary for uniqueness

bull For example for the entity

Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)

Possible candidate keys are EID SIN

FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment

EID SIN are also candidate keys

8142 Composite key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Composed of more than one attribute

bull For example as discussed earlier

First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department

bull A composite key can have two or more attributes but it must be minimal

29

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 25: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model

The relational model has provided the basis for

bull Research on theory of datarelationshipconstraint

bull Numerous database design methodologies

bull The standard database access language SQL

bull Almost all modern commercial database management systems

The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo

71 Fundamental concepts in Relational Data Model

711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute

Source httpcnxorgcontentm28250latest

712 TableAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables

Chapter 7 20

Fig 71

713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation

The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another

Look at the following example of an ID card to see the relationship between fields andtheir data

714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example

21

bull The domain of Marital status has a set of possibilities Married Single Divorced

bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip

bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000

bull The domain of First Name is the set of character strings that represents names ofpeople

In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter

715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records

A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases

Fig 72

The example above shows us how fields can hold a range of different sorts of data ndashthis one has

bull An ID field ordinal number integer

bull A Pubdate field daymonthyear date

bull An author field Initial Surname text

bull A title field text bull

bull A body text field text

By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates

Chapter 7 22

Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way

72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The degree is the number of attributes In our example above the degree is 4

73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Table has a name that is distinct from all other tables in the database

bull There are no duplicate rows each row is distinct

bull Entries in columns are atomic (no repeating groups or multivalued attributes)

bull Entries from columns are from the same domain based on their data type

number (numeric integer float smallinthellip)

character (string)

date

logical (true or false)

bull Operations combining different data types are disallowed

bull Each attribute has a distinct name

bull sequence of columns is insignificant

bull sequence of rows is insignificant

731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We have used many different terms Here is a summary Alternative 1 is the mostcommon

23

Chapter 7 24

Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The Entity Relationship Data Model (ER) has existed for over 35 years

The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations

ER modelling is based on two concepts

bull Entities that is things Eg Prof Ba Course Database System

bull Relationships that is associations or interactions between entities

Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams

The example database

For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model

This database contains the information about employees departments and projects

bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department

bull A department controls a number of projects each of which has unique name aunique number and the budget

bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects

bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee

81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An entity is an object in the real world with an independent existence and can bedifferentiated from other objects

An entity might be

bull An object with physical existence Eg a lecturer a student a car

25

bull An object with conceptual existence Eg a course a job a position

An Entity Type defines a collection of similar entities

An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box

Fig 81 Source httpcnxorgcontentm28250latest

811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics

bull they are the lsquobuilding blocksrsquo of a database

bull the primary key may be simple or composite

bull the primary key is not a foreign key

bull they do not depend on another entity for their existence

Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning

bull Dependent entities are used to connect two kernels together

bull They are said to be existent dependent on two or more tables

bull Many to many relationships become associative tables with at least two foreignkeys

bull They may contain other attributes

bull The foreign key identifies each associated table

bull There are three options for the primary key

bull use a composite of foreign keys of associated tables if unique

bull use a composite of foreign keys and qualifying column

bull create a new simple primary key

Characteristic entities provide more information about another table These entitieshave the following characteristic

Chapter 8 26

bull They represent multi-valued attributes

bull They describe other entities

bull They typically have a one to many relationship

bull The foreign key is used to further identify the characterized table

bull Options for primary key are as follows

bull foreign key plus a qualifying column

bull or create a new simple primary key

bull Employee(EID Name Address Age Salary)

bull EmployeePhone(EID Phone)

812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)

Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram

In the diagram each attribute is represented by an oval with a name inside

Fig 82 Source httpcnxorgcontentm28250latest

813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model

Simple attributes are attributes that are drawn from the atomic value domains

eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes

27

eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo

Fig 83 Source httpcnxorgcontentm28250latest

Multivalued attributes Attributes that have a set of values for each entity

eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo

Fig 84 Source httpcnxorgcontentm28250latest

Derived attributes Attributes contain values that are calculated from other attributes

eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute

Fig 85 Source httpcnxorgcontentm28250latest

Chapter 8 28

814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set

8141 Candidate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull a simple or composite key that is unique and minimal

bull unique ndash no two rows in a table may have the same value at any time

bull minimal ndash every column must be necessary for uniqueness

bull For example for the entity

Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)

Possible candidate keys are EID SIN

FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment

EID SIN are also candidate keys

8142 Composite key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Composed of more than one attribute

bull For example as discussed earlier

First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department

bull A composite key can have two or more attributes but it must be minimal

29

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 26: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

Fig 71

713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation

The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another

Look at the following example of an ID card to see the relationship between fields andtheir data

714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example

21

bull The domain of Marital status has a set of possibilities Married Single Divorced

bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip

bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000

bull The domain of First Name is the set of character strings that represents names ofpeople

In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter

715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records

A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases

Fig 72

The example above shows us how fields can hold a range of different sorts of data ndashthis one has

bull An ID field ordinal number integer

bull A Pubdate field daymonthyear date

bull An author field Initial Surname text

bull A title field text bull

bull A body text field text

By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates

Chapter 7 22

Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way

72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The degree is the number of attributes In our example above the degree is 4

73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Table has a name that is distinct from all other tables in the database

bull There are no duplicate rows each row is distinct

bull Entries in columns are atomic (no repeating groups or multivalued attributes)

bull Entries from columns are from the same domain based on their data type

number (numeric integer float smallinthellip)

character (string)

date

logical (true or false)

bull Operations combining different data types are disallowed

bull Each attribute has a distinct name

bull sequence of columns is insignificant

bull sequence of rows is insignificant

731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We have used many different terms Here is a summary Alternative 1 is the mostcommon

23

Chapter 7 24

Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The Entity Relationship Data Model (ER) has existed for over 35 years

The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations

ER modelling is based on two concepts

bull Entities that is things Eg Prof Ba Course Database System

bull Relationships that is associations or interactions between entities

Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams

The example database

For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model

This database contains the information about employees departments and projects

bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department

bull A department controls a number of projects each of which has unique name aunique number and the budget

bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects

bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee

81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An entity is an object in the real world with an independent existence and can bedifferentiated from other objects

An entity might be

bull An object with physical existence Eg a lecturer a student a car

25

bull An object with conceptual existence Eg a course a job a position

An Entity Type defines a collection of similar entities

An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box

Fig 81 Source httpcnxorgcontentm28250latest

811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics

bull they are the lsquobuilding blocksrsquo of a database

bull the primary key may be simple or composite

bull the primary key is not a foreign key

bull they do not depend on another entity for their existence

Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning

bull Dependent entities are used to connect two kernels together

bull They are said to be existent dependent on two or more tables

bull Many to many relationships become associative tables with at least two foreignkeys

bull They may contain other attributes

bull The foreign key identifies each associated table

bull There are three options for the primary key

bull use a composite of foreign keys of associated tables if unique

bull use a composite of foreign keys and qualifying column

bull create a new simple primary key

Characteristic entities provide more information about another table These entitieshave the following characteristic

Chapter 8 26

bull They represent multi-valued attributes

bull They describe other entities

bull They typically have a one to many relationship

bull The foreign key is used to further identify the characterized table

bull Options for primary key are as follows

bull foreign key plus a qualifying column

bull or create a new simple primary key

bull Employee(EID Name Address Age Salary)

bull EmployeePhone(EID Phone)

812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)

Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram

In the diagram each attribute is represented by an oval with a name inside

Fig 82 Source httpcnxorgcontentm28250latest

813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model

Simple attributes are attributes that are drawn from the atomic value domains

eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes

27

eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo

Fig 83 Source httpcnxorgcontentm28250latest

Multivalued attributes Attributes that have a set of values for each entity

eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo

Fig 84 Source httpcnxorgcontentm28250latest

Derived attributes Attributes contain values that are calculated from other attributes

eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute

Fig 85 Source httpcnxorgcontentm28250latest

Chapter 8 28

814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set

8141 Candidate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull a simple or composite key that is unique and minimal

bull unique ndash no two rows in a table may have the same value at any time

bull minimal ndash every column must be necessary for uniqueness

bull For example for the entity

Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)

Possible candidate keys are EID SIN

FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment

EID SIN are also candidate keys

8142 Composite key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Composed of more than one attribute

bull For example as discussed earlier

First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department

bull A composite key can have two or more attributes but it must be minimal

29

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 27: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

bull The domain of Marital status has a set of possibilities Married Single Divorced

bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip

bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000

bull The domain of First Name is the set of character strings that represents names ofpeople

In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter

715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records

A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases

Fig 72

The example above shows us how fields can hold a range of different sorts of data ndashthis one has

bull An ID field ordinal number integer

bull A Pubdate field daymonthyear date

bull An author field Initial Surname text

bull A title field text bull

bull A body text field text

By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates

Chapter 7 22

Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way

72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The degree is the number of attributes In our example above the degree is 4

73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Table has a name that is distinct from all other tables in the database

bull There are no duplicate rows each row is distinct

bull Entries in columns are atomic (no repeating groups or multivalued attributes)

bull Entries from columns are from the same domain based on their data type

number (numeric integer float smallinthellip)

character (string)

date

logical (true or false)

bull Operations combining different data types are disallowed

bull Each attribute has a distinct name

bull sequence of columns is insignificant

bull sequence of rows is insignificant

731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We have used many different terms Here is a summary Alternative 1 is the mostcommon

23

Chapter 7 24

Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The Entity Relationship Data Model (ER) has existed for over 35 years

The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations

ER modelling is based on two concepts

bull Entities that is things Eg Prof Ba Course Database System

bull Relationships that is associations or interactions between entities

Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams

The example database

For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model

This database contains the information about employees departments and projects

bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department

bull A department controls a number of projects each of which has unique name aunique number and the budget

bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects

bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee

81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An entity is an object in the real world with an independent existence and can bedifferentiated from other objects

An entity might be

bull An object with physical existence Eg a lecturer a student a car

25

bull An object with conceptual existence Eg a course a job a position

An Entity Type defines a collection of similar entities

An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box

Fig 81 Source httpcnxorgcontentm28250latest

811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics

bull they are the lsquobuilding blocksrsquo of a database

bull the primary key may be simple or composite

bull the primary key is not a foreign key

bull they do not depend on another entity for their existence

Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning

bull Dependent entities are used to connect two kernels together

bull They are said to be existent dependent on two or more tables

bull Many to many relationships become associative tables with at least two foreignkeys

bull They may contain other attributes

bull The foreign key identifies each associated table

bull There are three options for the primary key

bull use a composite of foreign keys of associated tables if unique

bull use a composite of foreign keys and qualifying column

bull create a new simple primary key

Characteristic entities provide more information about another table These entitieshave the following characteristic

Chapter 8 26

bull They represent multi-valued attributes

bull They describe other entities

bull They typically have a one to many relationship

bull The foreign key is used to further identify the characterized table

bull Options for primary key are as follows

bull foreign key plus a qualifying column

bull or create a new simple primary key

bull Employee(EID Name Address Age Salary)

bull EmployeePhone(EID Phone)

812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)

Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram

In the diagram each attribute is represented by an oval with a name inside

Fig 82 Source httpcnxorgcontentm28250latest

813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model

Simple attributes are attributes that are drawn from the atomic value domains

eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes

27

eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo

Fig 83 Source httpcnxorgcontentm28250latest

Multivalued attributes Attributes that have a set of values for each entity

eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo

Fig 84 Source httpcnxorgcontentm28250latest

Derived attributes Attributes contain values that are calculated from other attributes

eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute

Fig 85 Source httpcnxorgcontentm28250latest

Chapter 8 28

814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set

8141 Candidate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull a simple or composite key that is unique and minimal

bull unique ndash no two rows in a table may have the same value at any time

bull minimal ndash every column must be necessary for uniqueness

bull For example for the entity

Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)

Possible candidate keys are EID SIN

FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment

EID SIN are also candidate keys

8142 Composite key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Composed of more than one attribute

bull For example as discussed earlier

First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department

bull A composite key can have two or more attributes but it must be minimal

29

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 28: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way

72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The degree is the number of attributes In our example above the degree is 4

73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Table has a name that is distinct from all other tables in the database

bull There are no duplicate rows each row is distinct

bull Entries in columns are atomic (no repeating groups or multivalued attributes)

bull Entries from columns are from the same domain based on their data type

number (numeric integer float smallinthellip)

character (string)

date

logical (true or false)

bull Operations combining different data types are disallowed

bull Each attribute has a distinct name

bull sequence of columns is insignificant

bull sequence of rows is insignificant

731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We have used many different terms Here is a summary Alternative 1 is the mostcommon

23

Chapter 7 24

Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The Entity Relationship Data Model (ER) has existed for over 35 years

The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations

ER modelling is based on two concepts

bull Entities that is things Eg Prof Ba Course Database System

bull Relationships that is associations or interactions between entities

Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams

The example database

For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model

This database contains the information about employees departments and projects

bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department

bull A department controls a number of projects each of which has unique name aunique number and the budget

bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects

bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee

81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An entity is an object in the real world with an independent existence and can bedifferentiated from other objects

An entity might be

bull An object with physical existence Eg a lecturer a student a car

25

bull An object with conceptual existence Eg a course a job a position

An Entity Type defines a collection of similar entities

An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box

Fig 81 Source httpcnxorgcontentm28250latest

811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics

bull they are the lsquobuilding blocksrsquo of a database

bull the primary key may be simple or composite

bull the primary key is not a foreign key

bull they do not depend on another entity for their existence

Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning

bull Dependent entities are used to connect two kernels together

bull They are said to be existent dependent on two or more tables

bull Many to many relationships become associative tables with at least two foreignkeys

bull They may contain other attributes

bull The foreign key identifies each associated table

bull There are three options for the primary key

bull use a composite of foreign keys of associated tables if unique

bull use a composite of foreign keys and qualifying column

bull create a new simple primary key

Characteristic entities provide more information about another table These entitieshave the following characteristic

Chapter 8 26

bull They represent multi-valued attributes

bull They describe other entities

bull They typically have a one to many relationship

bull The foreign key is used to further identify the characterized table

bull Options for primary key are as follows

bull foreign key plus a qualifying column

bull or create a new simple primary key

bull Employee(EID Name Address Age Salary)

bull EmployeePhone(EID Phone)

812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)

Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram

In the diagram each attribute is represented by an oval with a name inside

Fig 82 Source httpcnxorgcontentm28250latest

813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model

Simple attributes are attributes that are drawn from the atomic value domains

eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes

27

eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo

Fig 83 Source httpcnxorgcontentm28250latest

Multivalued attributes Attributes that have a set of values for each entity

eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo

Fig 84 Source httpcnxorgcontentm28250latest

Derived attributes Attributes contain values that are calculated from other attributes

eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute

Fig 85 Source httpcnxorgcontentm28250latest

Chapter 8 28

814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set

8141 Candidate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull a simple or composite key that is unique and minimal

bull unique ndash no two rows in a table may have the same value at any time

bull minimal ndash every column must be necessary for uniqueness

bull For example for the entity

Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)

Possible candidate keys are EID SIN

FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment

EID SIN are also candidate keys

8142 Composite key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Composed of more than one attribute

bull For example as discussed earlier

First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department

bull A composite key can have two or more attributes but it must be minimal

29

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 29: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

Chapter 7 24

Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The Entity Relationship Data Model (ER) has existed for over 35 years

The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations

ER modelling is based on two concepts

bull Entities that is things Eg Prof Ba Course Database System

bull Relationships that is associations or interactions between entities

Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams

The example database

For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model

This database contains the information about employees departments and projects

bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department

bull A department controls a number of projects each of which has unique name aunique number and the budget

bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects

bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee

81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An entity is an object in the real world with an independent existence and can bedifferentiated from other objects

An entity might be

bull An object with physical existence Eg a lecturer a student a car

25

bull An object with conceptual existence Eg a course a job a position

An Entity Type defines a collection of similar entities

An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box

Fig 81 Source httpcnxorgcontentm28250latest

811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics

bull they are the lsquobuilding blocksrsquo of a database

bull the primary key may be simple or composite

bull the primary key is not a foreign key

bull they do not depend on another entity for their existence

Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning

bull Dependent entities are used to connect two kernels together

bull They are said to be existent dependent on two or more tables

bull Many to many relationships become associative tables with at least two foreignkeys

bull They may contain other attributes

bull The foreign key identifies each associated table

bull There are three options for the primary key

bull use a composite of foreign keys of associated tables if unique

bull use a composite of foreign keys and qualifying column

bull create a new simple primary key

Characteristic entities provide more information about another table These entitieshave the following characteristic

Chapter 8 26

bull They represent multi-valued attributes

bull They describe other entities

bull They typically have a one to many relationship

bull The foreign key is used to further identify the characterized table

bull Options for primary key are as follows

bull foreign key plus a qualifying column

bull or create a new simple primary key

bull Employee(EID Name Address Age Salary)

bull EmployeePhone(EID Phone)

812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)

Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram

In the diagram each attribute is represented by an oval with a name inside

Fig 82 Source httpcnxorgcontentm28250latest

813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model

Simple attributes are attributes that are drawn from the atomic value domains

eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes

27

eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo

Fig 83 Source httpcnxorgcontentm28250latest

Multivalued attributes Attributes that have a set of values for each entity

eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo

Fig 84 Source httpcnxorgcontentm28250latest

Derived attributes Attributes contain values that are calculated from other attributes

eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute

Fig 85 Source httpcnxorgcontentm28250latest

Chapter 8 28

814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set

8141 Candidate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull a simple or composite key that is unique and minimal

bull unique ndash no two rows in a table may have the same value at any time

bull minimal ndash every column must be necessary for uniqueness

bull For example for the entity

Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)

Possible candidate keys are EID SIN

FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment

EID SIN are also candidate keys

8142 Composite key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Composed of more than one attribute

bull For example as discussed earlier

First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department

bull A composite key can have two or more attributes but it must be minimal

29

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 30: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The Entity Relationship Data Model (ER) has existed for over 35 years

The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations

ER modelling is based on two concepts

bull Entities that is things Eg Prof Ba Course Database System

bull Relationships that is associations or interactions between entities

Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams

The example database

For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model

This database contains the information about employees departments and projects

bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department

bull A department controls a number of projects each of which has unique name aunique number and the budget

bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects

bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee

81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An entity is an object in the real world with an independent existence and can bedifferentiated from other objects

An entity might be

bull An object with physical existence Eg a lecturer a student a car

25

bull An object with conceptual existence Eg a course a job a position

An Entity Type defines a collection of similar entities

An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box

Fig 81 Source httpcnxorgcontentm28250latest

811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics

bull they are the lsquobuilding blocksrsquo of a database

bull the primary key may be simple or composite

bull the primary key is not a foreign key

bull they do not depend on another entity for their existence

Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning

bull Dependent entities are used to connect two kernels together

bull They are said to be existent dependent on two or more tables

bull Many to many relationships become associative tables with at least two foreignkeys

bull They may contain other attributes

bull The foreign key identifies each associated table

bull There are three options for the primary key

bull use a composite of foreign keys of associated tables if unique

bull use a composite of foreign keys and qualifying column

bull create a new simple primary key

Characteristic entities provide more information about another table These entitieshave the following characteristic

Chapter 8 26

bull They represent multi-valued attributes

bull They describe other entities

bull They typically have a one to many relationship

bull The foreign key is used to further identify the characterized table

bull Options for primary key are as follows

bull foreign key plus a qualifying column

bull or create a new simple primary key

bull Employee(EID Name Address Age Salary)

bull EmployeePhone(EID Phone)

812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)

Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram

In the diagram each attribute is represented by an oval with a name inside

Fig 82 Source httpcnxorgcontentm28250latest

813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model

Simple attributes are attributes that are drawn from the atomic value domains

eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes

27

eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo

Fig 83 Source httpcnxorgcontentm28250latest

Multivalued attributes Attributes that have a set of values for each entity

eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo

Fig 84 Source httpcnxorgcontentm28250latest

Derived attributes Attributes contain values that are calculated from other attributes

eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute

Fig 85 Source httpcnxorgcontentm28250latest

Chapter 8 28

814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set

8141 Candidate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull a simple or composite key that is unique and minimal

bull unique ndash no two rows in a table may have the same value at any time

bull minimal ndash every column must be necessary for uniqueness

bull For example for the entity

Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)

Possible candidate keys are EID SIN

FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment

EID SIN are also candidate keys

8142 Composite key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Composed of more than one attribute

bull For example as discussed earlier

First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department

bull A composite key can have two or more attributes but it must be minimal

29

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 31: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

bull An object with conceptual existence Eg a course a job a position

An Entity Type defines a collection of similar entities

An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box

Fig 81 Source httpcnxorgcontentm28250latest

811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics

bull they are the lsquobuilding blocksrsquo of a database

bull the primary key may be simple or composite

bull the primary key is not a foreign key

bull they do not depend on another entity for their existence

Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning

bull Dependent entities are used to connect two kernels together

bull They are said to be existent dependent on two or more tables

bull Many to many relationships become associative tables with at least two foreignkeys

bull They may contain other attributes

bull The foreign key identifies each associated table

bull There are three options for the primary key

bull use a composite of foreign keys of associated tables if unique

bull use a composite of foreign keys and qualifying column

bull create a new simple primary key

Characteristic entities provide more information about another table These entitieshave the following characteristic

Chapter 8 26

bull They represent multi-valued attributes

bull They describe other entities

bull They typically have a one to many relationship

bull The foreign key is used to further identify the characterized table

bull Options for primary key are as follows

bull foreign key plus a qualifying column

bull or create a new simple primary key

bull Employee(EID Name Address Age Salary)

bull EmployeePhone(EID Phone)

812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)

Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram

In the diagram each attribute is represented by an oval with a name inside

Fig 82 Source httpcnxorgcontentm28250latest

813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model

Simple attributes are attributes that are drawn from the atomic value domains

eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes

27

eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo

Fig 83 Source httpcnxorgcontentm28250latest

Multivalued attributes Attributes that have a set of values for each entity

eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo

Fig 84 Source httpcnxorgcontentm28250latest

Derived attributes Attributes contain values that are calculated from other attributes

eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute

Fig 85 Source httpcnxorgcontentm28250latest

Chapter 8 28

814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set

8141 Candidate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull a simple or composite key that is unique and minimal

bull unique ndash no two rows in a table may have the same value at any time

bull minimal ndash every column must be necessary for uniqueness

bull For example for the entity

Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)

Possible candidate keys are EID SIN

FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment

EID SIN are also candidate keys

8142 Composite key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Composed of more than one attribute

bull For example as discussed earlier

First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department

bull A composite key can have two or more attributes but it must be minimal

29

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 32: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

bull They represent multi-valued attributes

bull They describe other entities

bull They typically have a one to many relationship

bull The foreign key is used to further identify the characterized table

bull Options for primary key are as follows

bull foreign key plus a qualifying column

bull or create a new simple primary key

bull Employee(EID Name Address Age Salary)

bull EmployeePhone(EID Phone)

812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)

Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram

In the diagram each attribute is represented by an oval with a name inside

Fig 82 Source httpcnxorgcontentm28250latest

813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model

Simple attributes are attributes that are drawn from the atomic value domains

eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes

27

eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo

Fig 83 Source httpcnxorgcontentm28250latest

Multivalued attributes Attributes that have a set of values for each entity

eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo

Fig 84 Source httpcnxorgcontentm28250latest

Derived attributes Attributes contain values that are calculated from other attributes

eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute

Fig 85 Source httpcnxorgcontentm28250latest

Chapter 8 28

814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set

8141 Candidate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull a simple or composite key that is unique and minimal

bull unique ndash no two rows in a table may have the same value at any time

bull minimal ndash every column must be necessary for uniqueness

bull For example for the entity

Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)

Possible candidate keys are EID SIN

FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment

EID SIN are also candidate keys

8142 Composite key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Composed of more than one attribute

bull For example as discussed earlier

First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department

bull A composite key can have two or more attributes but it must be minimal

29

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 33: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo

Fig 83 Source httpcnxorgcontentm28250latest

Multivalued attributes Attributes that have a set of values for each entity

eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo

Fig 84 Source httpcnxorgcontentm28250latest

Derived attributes Attributes contain values that are calculated from other attributes

eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute

Fig 85 Source httpcnxorgcontentm28250latest

Chapter 8 28

814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set

8141 Candidate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull a simple or composite key that is unique and minimal

bull unique ndash no two rows in a table may have the same value at any time

bull minimal ndash every column must be necessary for uniqueness

bull For example for the entity

Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)

Possible candidate keys are EID SIN

FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment

EID SIN are also candidate keys

8142 Composite key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Composed of more than one attribute

bull For example as discussed earlier

First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department

bull A composite key can have two or more attributes but it must be minimal

29

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 34: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set

8141 Candidate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull a simple or composite key that is unique and minimal

bull unique ndash no two rows in a table may have the same value at any time

bull minimal ndash every column must be necessary for uniqueness

bull For example for the entity

Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)

Possible candidate keys are EID SIN

FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment

EID SIN are also candidate keys

8142 Composite key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Composed of more than one attribute

bull For example as discussed earlier

First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department

bull A composite key can have two or more attributes but it must be minimal

29

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 35: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

8143 Primary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null

bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key

8144 Secondary key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc

bull all other candidate keys not chosen as the primary key

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key

8145 Alternate key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull all other candidate keys not chosen as the primary key

8146 Foreign key

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An attribute in one table that references the primary key of another table OR itcan be null

bull Both foreign and primary keys must be of the same data type

Chapter 8 30

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 36: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key

8147 Nulls

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)

bull No data entry

bull Not permitted in primary key

bull Should be avoided in other attributes

bull Can represent

bull An unknown attribute value

bull A known but missing attribute value

bull A ldquonot applicablerdquo condition

bull Can create problems when functions such as COUNT AVERAGE and SUMare used

bull Can create logical problems when relational tables are linked

NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)

PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000

SELECT emp FROM salary_tbl

WHERE jobname = lsquoSalesrsquo AND

(commission + salary) gt 30000 ndashgt E10 and E12

This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below

31

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 37: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

SELECT emp FROM salary_tbl

WHERE jobName = lsquoSalesrsquo AND

(commission gt 30000 OR salary gt 30000 OR

(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13

815 Entity Types

8151 Existence Dependency

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An entityrsquos existence is dependent on the existence of the related entity

bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null

bull Spouse entity is existence dependent on the employee

8152 Weak Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity

bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist

8153 Strong Entities

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity

bull Kernels are strong entities

bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity

Chapter 8 32

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 38: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are the glue that holds the tables together They are used to connect togetherrelated information in other tables

8161 1M relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be the norm in any relational database design

bull Relational database norm

bull Found in any database environment

eg One department has many employees

Fig 86 Source httpcnxorgcontentm28250latest

8162 11 relationship

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Should be rare in any relational database design

bull One entity can be related to only one other entity and vice versa

bull Could indicate that two entities actually belong in the same table

eg A professor chairs one department and a department has only one chair

8163 11 Example

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull An employee is associated with one spouse and one spouse is associated withone employee

33

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 39: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

bull Cannot be implemented as such in the relational model

bull MN relationships can be changed into two 1M relationships

bull Can be implemented by breaking it up to produce a set of 1M relationships

bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity

bull A student has many classes and a class can hold many students

bull Implementation of a composite entity

bull Yields required MN to 1M conversion

bull Composite entity table must contain at least the primary keys of original tables

bull Linking table contains multiple occurrences of the foreign key values

bull Additional attributes may be assigned as needed

8164 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S

8165 MN relationships

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 87 Source httpcnxorgcontentm28250latest

Chapter 8 34

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 40: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One in which a relationship exists between occurrences of the same entity set

The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship

In some entities a separate column can be created that refers to the primary key ofthe same entity set

Fig 88 Source httpcnxorgcontentm28250latest

818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest

Fig 89 Source httpcnxorgcontentm28250latest

35

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 41: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Relationship strength is based on how the primary key of a related entity is defined

Weak Relationships (non-identifying relationship)

bull Exists if the PK of the related entity does not contain a PK component of theparent entity

bull Customer(custid custname)

bull Order(orderID custid date)

Strong Relationship (identifying)

bull Exists when the PK of the related entity contains the PK component of the parententity

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTimehellip)

Chapter 8 36

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 42: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

Chapter 9 Integrity Rules andConstraints

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics

91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints

There are several kinds of integrity constraints

Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone

Referential integrity ndash a foreign key must have a matching primary key or it must benull

This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint

92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In the CustomerOrder database

bull Customer(custid custname)

bull Order(orderID custid OrderDate)

37

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 43: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database

bull Course(CrsCode DeptCode Description)

bull Class(CrsCode Section ClassTime)

The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity

When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin

93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table

94 Referential Integrity using Transact SQL (MS SQLServer)

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

CREATE TABLE Customer

Chapter 9 38

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 44: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

( CustID INTEGER PRIMARY KEY CustName CHAR(35) )

CREATE TABLE Orders

( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )

The referential integrity is set when creating the table (Orders) with the FK

95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK

DELETE

bull RESTRICT

bull CASCADE

bull SET TO NULL

UPDATE

bull RESTRICT

bull CASCADE

In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable

Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager

951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules

39

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 45: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users

Some examples of business rules are

bull A teacher can teach many students

bull A class can have a maximum of 35 students

bull A course can be taught many times but by only one instructor

bull Not all teachers teach classes etc

96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality

961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by

The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples

Chapter 9 40

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 46: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence does not require a corresponding entity occurrence in arelationship

This shows a zero or many The many side is optional

Fig 91

This can also be read as

Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity

This shows a zero or one The 1 side is optional

98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One entity occurrence requires a corresponding entity occurrence in a relationship

This example shows one or many The many side is mandatory

41

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 47: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

Fig 92

This example shows one or many The many side is mandatory

Fig 93

So far we have seen that the right hand side can have

a 0 (zero) cardinality and a connectivity of many or one

The left hand side can also have a cardinality

of 0 (zero)

However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two

Chapter 9 42

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 48: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values

Fig 94

981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK

Fig 95

43

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 49: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at

bull basic theory and definition of functional dependencies

bull methodology for improving schema designs (normalization)

The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design

101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)

In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether

Consider the following table defining bank accountsbranches

Fig 101 Source httpcnxorgcontentm28252latest

Chapter 10 44

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 50: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When we insert a new record we need to check that branch data is consistent withexisting rows

Fig 102 Source httpcnxorgcontentm28252latest

103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If a branch changes address we need to update all rows referring to that branch

Fig 103 Source httpcnxorgcontentm28252latest

104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If we remove information about the last account at a branch (Downtown) all of thebranch information disappears

45

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 51: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

Fig 104 Source httpcnxorgcontentm28252latest

The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)

The Bank Accounts tables should appear as follows

Fig 105

This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded

Source httpcnxorgcontentm28252latest

Another Example

Employee Project Table

Assumptions EmpID and ProjectID is a composite PK

Project ID determines Budget ie Project P1 has a budget of 32 hours

Fig 106

Chapter 10 46

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 52: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects

Fig 107

Tables with Data

Fig 108

1 No anomalies will be created if a budget is changed

2 No dummy values are needed for projects that have no employees assigned

3 If an employeersquos contribution is deleted no important data is lost

4 No anomalies are created if an employeersquos contribution is added

The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies

47

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 53: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y

X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table

Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title

111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Consider the following instance r(R) of the relation schema R(ABCDE)

Fig 111

What kind of dependencies can we observe among the attributes in Table R

bull Since the values of A are unique it follows from the FD definition that

bull A rarrB A rarrC A rarrD A rarrE

Chapter 11 48

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 54: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

bull It also follows that A rarrBC (or any other subset of ABCDE)

bull This can be summarized as A rarrBCDE

bull From our understanding of primary keys A is a Primary Key

bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE

Other observations

combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C

When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants

112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong

Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y

Source httpenwikipediaorgwikiArmstrong27s_axioms

1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If Y is a subset of X then X determines Y

e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)

X X Y

49

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 55: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull Also known as a Partial dependency

Studentno course mdashgt studentName address city prov pc grade dateCompleted

This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo

To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc

1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in

StudentNomdashgt studentNameaddress city prov pc ProgramID

ProgramID mdashgt ProgramName

113 Additional rules

1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 112

If X determines Y and X determines Z then X must also determines Y and Z

Chapter 11 50

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 56: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together

SIN mdashgt EmpName

SIN mdashgt SpouseName

You may want to join these two tables into one

SIN ndashgt EmpName SpouseName

Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table

1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 113

If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables

114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Fig 114

A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified

ProjectNo and EmpNo combined is the PK

Partial Dependencies

ProjectNo mdashgt ProjName

EmpNo mdashgt EmpName DeptNo HrsWork

Transitive Dependency

DeptNo mdashgt DeptName

51

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 57: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization is the branch of relational theory providing design insights The goals ofnormalization are

bull be able to characterize the level of redundancy in a relational schema

bull provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems

121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used

1stNormal Form

2ndNormal Form

3rdNormal Form

BCNF

First Normal Form

In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups

To normalize a relation that contains a repeating group remove the repeating groupand form two new relations

The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification

Chapter 12 52

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 58: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)

12111 Process for 1NF

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull In the Student Grade Report table the repeating group is the course informationA student can take many courses

bull Remove the repeating group In this case itrsquos the Course information for eachstudent

bull Identify the Primary Key in your new table

bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)

bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)

bull The Student table is in First Normal Form (repeating group removed)

Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)

12112 Update anomalies in 1st normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)

bull To add a new course we need a student

bull When Course information needs to be updated we may have inconsistencies

bull To delete a student we might also delete critical information about a course

122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation must first be in 1stNF

53

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 59: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute

bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)

1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull To move on to 2ndNF the table must be in 1stNF

bull The student table is already in 2ndNF because it has a single column PK

bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade

bull Identify the new table that contains the course information

bull Identify the PK for the new table

Student (Studentnostudent_namemajor)

Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)

1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull When adding a new instructor we need a course

bull Updating course information could lead to inconsistencies for instructorinformation

bull Deleting a course may also delete instructor information

123 3rd Normal Form

1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

bull The relation has to be in 2nd normal form

Chapter 12 54

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 60: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute

bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship

bull Create new table(s) with removed dependency

bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies

Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)

12311 Update anomalies in 3rd normal form

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

At this stage there should be no anomalies in 3rd normal form

Lets look at the dependency diagrams for this example

The first step is to remove repeating groups As discussed above

Student (Student_no student_name major)

Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)

To recap the normalization process for the School database the followingdependencies are shown in the diagram

Fig 121

PD ndash Partial Dependency

TD ndash Transitive DependencyFD ndash Full Dependency

55

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 61: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key

Consider the following table

St_Maj_Adv table

Semantic Rules

Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major

Functional dependencies

Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)

Fd2 Advisor mdashmdashgt Major (not a candidate key)

ANOMALIES

Deleting student deletes advisor info

Insert a new advisor ndash need a student

Update ndash inconsistencies

Chapter 12 56

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 62: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

Note no single attribute is a candidate key

Primary key can be student_idmajor or student_idadvisor

To reduce the St_Maj_Adv relation to BCNF create two new tables

1 St_Adv (Student_idAdvisor)

2 Adv_Maj (AdvisorMajor)

St_Adv

Adv_Maj

57

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 63: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Client Interview Table

Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to

create a table that incorporates the first 3 Fds and another table for the 4thFd

Interview

StaffRoom

Chapter 12 58

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 64: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

Fig 122 Source httpdbgrussellorgsection008html

12331 Normalization and Database Design

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized

Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships

Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD

It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently

59

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 65: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

Chapter 13 Database DevelopmentProcess

Available under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase

Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next

We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows

bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements

Chapter 13 60

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 66: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized

bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed

bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use

bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)

bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited

httpcnxorgcontentm28139latest

132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions

bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database

bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema

bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

61

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 67: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach

httpcnxorgcontentm28139latest

Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users

Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements

The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database

Chapter 13 62

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 68: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items

134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points

The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented

The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el

We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database

63

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 69: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent

The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view

First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined

Chapter 13 64

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 70: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests

Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation

It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development

At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow

In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In

65

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 71: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations

137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)

One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes

138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database

For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS

Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data

Chapter 13 66

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 72: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading

httpwwwoercommonsorgcoursesthe-database-development-life-cycleview

139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)

bull Document all entities discovered during the information gathering stage

bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key

bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)

bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel

bull Verify E-R modeling by normalizing tables

67

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators
Page 73: Database Design This document was created with Prince, a ... · • A database is designed, built, and populated with data for a specific purpose. • Each data item is stored in

Chapter 14 Database Users

141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

These are people whose jobs require access to a database for querying updating andgenerating reports

An end user might be one of the following

The application user uses the existing application programs to perform their dailytasks

Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess

142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task

143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http

creativecommonsorglicensesby-sa40)

A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system

Chapter 14 68

  • 1 Before the Advent of Database Systems
    • 11 File-based approach for banking system
    • 12 Data Redundancy
    • 13 Data Isolation
    • 14 Integrity problems
    • 15 Security problems
    • 16 Concurrency ndash access anomalies
    • 17 Database Approach
    • 18 Role of databases in Business
    • 19 Meaning of Data
      • 2 Fundamental Concepts
      • 3 Characteristics and Benefits of a Database
        • 31 Self-Describing Nature of a Database System
        • 32 Insulation between Program and Data
        • 33 Support multiple views of data
        • 34 Sharing of data and Multiuser system
        • 35 Control Data Redundancy
        • 36 Data Sharing
        • 37 Enforcing Integrity Constraints
        • 38 Restricting Unauthorised Access
        • 39 Data Independence
        • 310 Transaction Processing
        • 311 Providing multiple views of data
        • 312 Providing backup and recovery facilities
        • 313 Managing information
          • 4 Types of Database Models
            • 41 High-level Conceptual Data models
              • 411 Record-based Logical Data models
                  • 5 Data Modelling
                    • 51 Degree of Data Abstraction
                    • 52 External models
                    • 53 Conceptual model
                    • 54 Internal model
                    • 55 Physical model
                      • 551 Data Abstraction Layer
                      • 552 Schemas
                      • 553 Logical and Physical Data Independence
                      • 554 Logical data independence
                      • 555 Physical data independence
                          • 6 Classification of Database Systems
                            • 61 Based on data model
                            • 62 Based on the number of users
                            • 63 Based on the ways database is distributed
                              • 631 Centralized Systems
                              • 632 Distributed database system
                              • 633 Homogeneous distributed Database Systems
                              • 634 Heterogeneous distributed Database Systems
                                  • 7 The Relational Data Model
                                    • 71 Fundamental concepts in Relational Data Model
                                      • 711 Relation
                                      • 712 Table
                                      • 713 Column
                                      • 714 Domain
                                      • 715 Records
                                        • 72 Degree
                                        • 73 Properties of Tables
                                          • 731 Terminology
                                              • 8 Entity Relationship Model
                                                • 81 Entity Entity Set and Entity Type
                                                  • 811 Types of Entities
                                                  • 812 Attributes
                                                  • 813 Types of Attributes
                                                  • 814 Keys
                                                    • 8141 Candidate key
                                                    • 8142 Composite key
                                                    • 8143 Primary key
                                                    • 8144 Secondary key
                                                    • 8145 Alternate key
                                                    • 8146 Foreign key
                                                    • 8147 Nulls
                                                      • 815 Entity Types
                                                        • 8151 Existence Dependency
                                                        • 8152 Weak Entities
                                                        • 8153 Strong Entities
                                                          • 816 Relationships
                                                            • 8161 1M relationship
                                                            • 8162 11 relationship
                                                            • 8163 11 Example
                                                            • 8164 MN relationships
                                                            • 8165 MN relationships
                                                              • 817 Unary Relationships (recursive)
                                                              • 818 Ternary Relationships
                                                              • 819 Relationship Strength
                                                                  • 9 Integrity Rules and Constraints
                                                                    • 91 Domain Integrity
                                                                    • 92 Referential integrity Examples
                                                                    • 93 Referential Integrity in MS Access
                                                                    • 94 Referential Integrity using Transact SQL (MS SQL Server)
                                                                    • 95 Foreign Key Rules
                                                                      • 951 Business Rules
                                                                        • 96 Cardinality
                                                                          • 961 Relationship Participation
                                                                            • 97 Optional
                                                                            • 98 Mandatory
                                                                              • 981 Relationship Types
                                                                                  • 10 ER Modelling
                                                                                    • 101 Relational Design and Redundancy
                                                                                    • 102 Insertion anomaly
                                                                                    • 103 Update anomaly
                                                                                    • 104 Deletion anomaly
                                                                                      • 11 Functional Dependencies
                                                                                        • 111 Rules of Functional Dependencies
                                                                                        • 112 Inference Rules
                                                                                          • 1121 Axiom of reflexivity
                                                                                          • 1122 Axiom of augmentation
                                                                                          • 1123 Axiom of transitivity
                                                                                            • 113 Additional rules
                                                                                              • 1131 Union
                                                                                              • 1132 Decomposition
                                                                                                • 114 Dependency Diagram
                                                                                                  • 12 Normalization
                                                                                                    • 121 Normal Forms
                                                                                                      • 1211 School database
                                                                                                        • 12111 Process for 1NF
                                                                                                        • 12112 Update anomalies in 1st normal form
                                                                                                            • 122 2nd Normal Form
                                                                                                              • 1221 Process for 2NF
                                                                                                              • 1222 Update anomalies in 2nd normal form
                                                                                                                • 123 3rd Normal Form
                                                                                                                  • 1231 Remove Transitive Dependencies
                                                                                                                    • 12311 Update anomalies in 3rd normal form
                                                                                                                      • 1232 Boyce-Codd Normal Form (BCNF)
                                                                                                                      • 1233 BCNF Example 2
                                                                                                                        • 12331 Normalization and Database Design
                                                                                                                          • 13 Database Development Process
                                                                                                                            • 131 SDLC ndash Waterfall
                                                                                                                            • 132 Database Life Cycle
                                                                                                                            • 133 Requirements gathering
                                                                                                                            • 134 Analysis
                                                                                                                            • 135 Logical Design
                                                                                                                            • 136 Implementation
                                                                                                                            • 137 Realising the design
                                                                                                                            • 138 Populating the database
                                                                                                                            • 139 Guidelines For Developing An ER Diagram
                                                                                                                              • 14 Database Users
                                                                                                                                • 141 End users
                                                                                                                                • 142 Application Programmers
                                                                                                                                • 143 Database Administrators